Carly Stevens’s research while affiliated with Lancaster University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (2)


Fig. 1. Types of information extracted from the Journal of Botany (figure (a) -courtesy of the Botany journal (1885)).
Fig. 2. Methodology proposed to extract information from historical text.
Fig. 3. Creation of plant species pattern file.
Fig. 4. MongoDB queries showing entities extracted together their contextual information.
Fig. 5. Vocabularies for the linked data model.

+3

Automating the extraction of information from a historical text and building a linked data model for the domain of ecology and conservation science
  • Article
  • Full-text available

October 2022

·

117 Reads

·

9 Citations

Heliyon

·

Robert Smail

·

Carly Stevens

·

Gordon Blair

Data heterogeneity is a pressing issue and is further compounded if we have to deal with data from textual documents. The unstructured nature of such documents implies that collating, comparing and analysing the information contained therein can be a challenging task. Automating these processes can help to unleash insightful knowledge that otherwise remains buried in them. Moreover, integrating the extracted information from the documents with other related information can help to make more information-rich queries. In this context, the paper presents a comprehensive review of text extraction and data integration techniques to enable this automation process in an ecological context. The paper investigates into extracting valuable floristic information from a historical Botany journal. The purpose behind this extraction is to bring to light relevant pieces of information contained within the document. In addition, the paper also explores the need to integrate the extracted information together with other related information from disparate sources. All the information is then rendered into a query-able form in order to make unified queries. Hence, the paper makes use of a combination of Machine Learning, Natural Language Processing and Semantic Web techniques to achieve this. The proposed approach is demonstrated through the information extracted from the journal and the information-rich queries made through the integration process. The paper shows that the approach has a merit in extracting relevant information from the journal, discusses how the machine learning models have been designed to classify complex information and also gives a measure of their performance. The paper also shows that the approach has a merit in query time in regard to querying floristic information from a multi-source linked data model.

Download

Fig. 3. Match rates across the corpus
Fig. 4. Geographical distribution of plant species matched from across the corpus
Uncovering Environmental Change in the English Lake District: Using Computational Techniques to Trace the Presence and Documentation of Historical Flora

October 2021

·

80 Reads

·

3 Citations

Digital Scholarship in the Humanities

Robert Smail

·

Chris Donaldson

·

·

[...]

·

Carly Stevens

There is a lack of concrete knowledge about floristic change in Britain before the mid-20th century. Relevant evidence is available, but it is principally contained in disparate historical sources. In this article, we demonstrate how such sources can be efficiently collated and analysed through the implementation of state-of-the-art computational-linguistic and historical-geographic information systems (GIS) techniques. We do so through a case study that focuses on the floristic history of the English Lake District. This region has been selected because of its outstanding cultural and environmental value and because it has been extensively and continuously documented since the late-17th century. We outline how natural language processing (NLP) techniques can be integrated with Kew’s Plants of the World Online database to enable temporal shifts in plant-naming conventions to be more accurately traced across a heterogeneous corpus of texts published between 1682 and 1904. Through collocate analysis and automated geoparsing techniques, the geographies associated with these plant names are then identified and extracted. Finally, we use GIS to demonstrate the potential of this data set for geo-temporal analysis and for revealing the historical distribution of Lake District flora. In outlining our methodology, this article indicates how the spatial and digital humanities can benefit research both in environmental history and in the environmental sciences more widely.

Citations (2)


... L'analyse de la littérature scientifique est une approche complémentaire, mais l'explosion du volume de publications rend l'accès aux informations pertinentes fastidieux. L'extraction de connaissances vise à pallier ce problème en identifiant et en synthétisant automatiquement les informations clés présentes dans les documents à l'aide de méthodes de traitement automatique de la langage (TALN), dans des domaines variés tels que l'écologie (Nundloll et al., 2022) ou l'aménagement du territoire (Koptelov et al., 2023). L'extraction de connaissances à partir de documents scientifiques sur une thématique précise nécessite une expertise spécifique au domaine afin de garantir la précision dans la construction et la validation des données extraites. ...

Reference:

Extraction de connaissances à partir de données textuelles : application à la découverte de règles de changement d'usage des sols
Automating the extraction of information from a historical text and building a linked data model for the domain of ecology and conservation science

Heliyon

... Missing data were also the result of current conditions rather than past ones. For instance, authors lacked access to resources that would permit a more thorough or complete analysis due to the fact that the physical artefacts were either archived in inaccessible locations [46,50]) or even copyrighted [4,77,109]. Some publications also mention lacking computational power to run better or more complete versions of the data analysis [103]. ...

Uncovering Environmental Change in the English Lake District: Using Computational Techniques to Trace the Presence and Documentation of Historical Flora

Digital Scholarship in the Humanities