
Richard EvansUniversity of Wolverhampton · Research Institute in Information and Language Processing (RIILP)
Richard Evans
Master of Science: Cognitive Science and Natural Language
About
48
Publications
8,510
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
776
Citations
Citations since 2017
Introduction
Richard Evans currently works at the Research Institute in Information and Language Processing (RIILP), University of Wolverhampton. Richard does research in Computational Linguistics. His current project is 'Sentence Simplification for Language Processing'.
Additional affiliations
October 2011 - present
September 2003 - September 2011
October 1998 - September 2003
Education
February 2012 - January 2020
October 1995 - September 1996
October 1992 - September 1995
University of Wales (Bangor)
Field of study
- Linguistics
Publications
Publications (48)
Identifying words which may cause difficulty for a reader is an essential step in most lexical text simplification systems prior to lexical substitution and can also be used for assessing the readability of a text. This task is commonly referred to as complex word identification (CWI) and is often modelled as a supervised classification problem. Fo...
Encapsulators are linguistic units which establish coherent referential connections to the preceding discourse in a text. In this paper, we address the challenge of automatically analysing the pronominal encapsulator ello in Spanish text. Our method identifies, for each occurrence, the antecedent of the pronoun (including its grammatical type), the...
This paper presents the results and main findings of SemEval-2021 Task 1 - Lexical Complexity Prediction. We provided participants with an augmented version of the CompLex Corpus (Shardlow et al 2020). CompLex is an English multi-domain corpus in which words and multi-word expressions (MWEs) were annotated with respect to their complexity using a f...
The first step in most text simplification is to predict which words are considered complex for a given target population before carrying out lexical substitution. This task is commonly referred to as Complex Word Identification (CWI) and it is often modelled as a supervised classification problem. For training such systems, annotated datasets in w...
When processing a text, humans and machines must disambiguate between different uses of the pronoun it, including non-referential, nominal anaphoric or clause anaphoric ones. In this paper, we use eye-tracking data to learn how humans perform this disambiguation. We use this knowledge to improve the automatic classification of it. We show that by u...
In this paper, we report on the extrinsic evaluation of an automatic sentence simplification method with respect to two NLP tasks: semantic role labelling (SRL) and information extraction (IE). The paper begins with our observation of challenges in the intrinsic evaluation of sentence simplification systems, which motivates the use of extrinsic eva...
In this paper, we report on the extrin-sic evaluation of an automatic sentence simplification method with respect to two NLP tasks: semantic role labelling (SRL) and information extraction (IE). The paper begins with our observation of challenges in the intrinsic evaluation of sentence simplification systems, which motivates the use of extrinsic ev...
This article presents a new method to automatically simplify English sentences. The approach is designed to reduce the number of compound clauses and nominally bound relative clauses in input sentences. The article provides an overview of a corpus annotated with information about various explicit signs of syntactic complexity and describes the two...
Autistic Spectrum Disorder (ASD) is a neurodevelopmental disorder which has a life-long impact on the lives of people diagnosed with the condition. In many cases, people with ASD are unable to derive the gist or meaning of written documents due to their inability to process complex sentences, understand non-literal text, and understand uncommon and...
Given the lack of large user-evaluated corpora in disability-related NLP research (e.g. text simplification or readability assessment for people with cognitive disabilities), the question of choosing suitable training data for NLP models is not straightforward. The use of large generic corpora may be problematic because such data may not reflect th...
People with autism experience various reading comprehension difficulties, which is one explanation for the early school dropout, reduced academic achievement and lower levels of employment in this population. To overcome this issue, content developers who want to make their textbooks, websites or social media accessible to people with autism (and t...
This paper presents our investigation of the ability of 33 readability indices to account for the reading comprehension difficulty posed by texts for people with autism. The evaluation by autistic readers of 16 text passages is described, a process which led to the production of the first text collection for which readability has been evaluated by...
In the state of the art, there are scarce resources available to support development and evaluation of automatic text simplification (TS) systems for specific target populations. These comprise parallel corpora consisting of texts in their original form and in a form that is more accessible for different categories of target reader, including neuro...
This paper investigates non-destructive simplification, a type of syntactic text simplification which focuses on extracting embedded clauses from structurally complex sentences and rephras-ing them without affecting their original meaning. This process reduces the average sentence length and complexity to make text simpler. Although relevant for hu...
Syntactically complex sentences constitute an obstacle for some people with Autistic Spectrum Disorders. This paper evaluates a set of simplification rules specifically designed for tackling complex and compound sentences. In total, 127 different rules were developed for the rewriting of complex sentences and 56 for the rewriting of compound senten...
Introduction:
Numerous studies have been documenting during the last decades the difficulties of reading comprehension shown by people with autism spectrum disorder (ASD), including those with preserved intelligence. These difficulties can condition their educational path and directly impact on social inclusion, autonomy and access to employment....
The occurrence of syntactic phenomena such as coordination and subordination is characteristic of long, complex sentences. Text simplification systems need to detect and categorise constituents in order to generate simpler sentences. These constituents are typically bounded or linked by signs of syntactic complexity, which include conjunctions, com...
This article presents a new annotation scheme for syntactic complexity in text which has the advantage over other existing syntactic annotation schemes that it is easy to apply, is reliable and it is able to encode a wide range of phenomena. It is based on the notion that the syntactic complexity of sentences is explicitly indicated by signs such a...
In spite of the great number of diachronic studies in various languages, the methodology for investigating language change has not evolved much in the last fifty years. Following the progressive trends in other fields, in this paper, we argue for the adoption of a machine learning approach in diachronic studies, which could offer a more efficient a...
This study presents the results of an initial phase of a project seeking to convert texts into a more accessible form for people with
autism spectrum disorders by means of text simplification technologies.
Random samples of Simple Wikipedia articles are compared with texts from News, Health, and Fiction genres using four standard readability indic...
This article describes research aimed at improving the accuracy of an information extraction (IE) system by treating coordinate structures systematically. Commas, coordinating conjunctions, and adjacent comma-conjunction pairs are considered to be potential indicators of coordination in natural language. A recursive algorithm is implemented which c...
The present article is concerned with the problem of automatic database population via information extraction (IE) from web
pages obtained from heterogeneous sources, such as those retrieved by a domain crawler. Specifically, we address the task
of filling single multi-field templates from individual documents, a common scenario that involves free-...
In anaphora resolution for English, animacy identification can play an integral role in the application of agreement restrictions between pronouns and candidates, and as a result, can improve the accuracy of anaphora resolution systems. In this paper, two methods for animacy identification are proposed and evaluated using intrinsic and extrinsic me...
Papers discussing anaphora resolution algorithms or systems usually focus on the intrinsic evaluation of the algorithm/system
and not on the issue of extrinsic evaluation. In the context of anaphora resolution, extrinsic evaluation concerns the impact of an anaphora resolution module on a larger
NLP system of which it is part. In this paper we expl...
Since the mid 90s corpora has become very important for computational linguistics. This paper offers a survey of how they are currently used in different fields of the discipline, with particular emphasis on anaphora and coreference resolution, automatic summarisation and term extraction.
Their influence on other fields is also briefly discussed. S...
At present, information systems combining crawling and information extraction (IE) technologies acquire a lot of research and industrial interest. In this paper, we present an algorithm that exploits techniques for unsupervised IE pattern acquisition in order to facilitate identification of web pages containing information relevant to the IE task.
In this paper, a system for Named Entity Recognition in the Open domain (NERO) is described. It is concerned with recognition of various types of entity, types that will be appropriate for Information Extraction in any scenario context. The recognition task is performed by identifying normally capitalised phrases in a document and then submitting q...
Information about the animacy of nouns is important for a wide range of tasks in NLP. In this paper, we present a method for determining the animacy of English nouns using WordNet and machine learning techniques. Our method firstly categorises the senses from WordNet using an annotated corpus and then uses this information in order to classify noun...
This paper describes a new, advanced and completely revamped version of Mitkov’s knowledge-poor approach to pronoun resolution
[21]. In contrast to most anaphora resolution approaches, the new system, referred to as MARS, operates in fully automatic mode.
It benefits from purpose-built programs for identifying occurrences of nonnominal anaphora (in...
Metadata only In the majority of cases, the pronoun it illustrates nominal anaphora, tending to refer back to another noun phrase in the text. However, in a significant minority of cases, the pronoun is used in exceptional ways that fail to demonstrate strict nominal anaphora. The identification of these uses of it is important in all fields where...
The pronoun it is noted to be used in a variety of non-nominal ways. The identification of non-nominal pronouns is important in information
retrieval, machine translation and automatic summarisation. Given that previous work has only tackled a subset of those non-nominal
uses, a machine learning method for identification of all instances of non-nom...
The paper argues that a promising way to improve the success rate of preference-based anaphora resolution algorithms is the
use of machine learning. The paper outlines MARS - a program for automatic resolution of pronominal anaphors and describes
an experiment which we have conducted to optimise the success rate of MARS with the help of a genetic a...
In text processing the researcher often has to consider characteristics of documents in spaces with high dimensionality and noise, making them difficult to be properly understood. One solution to this problem is to use machine learning techniques over complex training data. In this paper we present three experiments (using artificial neural network...
Some references to human beings can be identified in English texts using named entity (Chinchor, 1997) and pronoun recognition but in some genres this still leaves a large number of references to people unidentified. The remaining noun phrases have no overt marking as to their animacy and clues as to the appropriate classification of a NP as animat...
This paper investigates the causes of the comparatively low success rates in finding the antecedents of plural pronouns as compared to
finding antecedents of singular pronouns. We are trying to show experimentally that considering morphological agreement as a strong
constraint in pronoun resolution results in the erroneous interpretation of almost...
The paper summarises the work of the Research Group in Computational Linguistics at the University of Wolverhampton towards the production of much needed annotated resources for evaluation and training of anaphora resolution systems. In particular, it describes the annotating tools developed to support the annotation, the corpora annotated and the...
In this paper we present two applications that depend on annotated corpora for their implementation, evaluation and improvement. The first is an automatic anaphora resolution system. After describing the algorithm we discuss the importance of corpora for the tasks of evaluation and automatic scoring and the development of a coreferentially annotate...
In this paper we present an automatic mechanism for bilingual (Spanish-English) alignment of anaphoric expressions. For this purpose, two anaphora resolution systems were used. Both are based on linguistic preferences and constraints, for Spanish (SUPPAR) and for English (MARS). These systems have been independently developed and each of them is pr...
In Information Extraction, a very common task is to extract facts about a single event or en-tity from an entire document such as a personal homepage, a job or a seminar announcement. The double classification method approaches this task with two automatic classifiers. The first one classifies larger document fragments to roughly indicate which of...
In this paper several methods for animacy recognition are evaluated. Each method has an increasing complexity over the previous one and involves more resources, and as a result, more computation. When assessing the performace of these methods we consider three factors: the results of an intrinsic evaluation, the results of an extrinsic evaluation,...
This paper investigates the causes of the comparatively low success rates in finding the antecedents of plural pronouns as compared to finding antecedents of singular pronouns. We are trying to show experimentally that considering morphological agreement as a strong constraint in pronoun resolution results in the erroneous interpretation of almost...
This paper describes and proposes a set of linguistically motivated rules for zero anaphora resolution in the context of a natural language processing chain developed for Portuguese. It describes the main grammatical rules imposing subject NP deletion and referential constraints in Brazilian Portuguese, in order to allow a correct identification of...
Paper presented at the Proceedings of RRNLP-2002, Alicante, Spain. In this paper, the behaviour of an existing pronominal anaphora resolution system is modified so that different types of pronoun are treated in different ways. Weights are derived using a genetic algorithm for the outcomes of tests applied by this branching algorithm. Detailed evalu...
This book constitutes revised selected papers of the 6th Discourse Anaphora and Anaphor Resolution Colloquium, DAARC 2007, held in Lagos, Portugal in March 2007. The 13 revised full papers presented were carefully reviewed and selected from 60 initial submissions during two rounds of reviewing and improvements. The papers are organized in topical s...
Projects
Projects (4)
To develop methods to assess the readability of input texts and to improve their accessibility for people with autism. This is with a view to the development of automatic tools to assist in this enterprise.
The development of automatic methods to identify coreferential phrases in texts.