Kevin Humphreys's research while affiliated with The University of Sheffield and other places

Publications (53)

Article
Full-text available
We describe SUPPLE, a freely-available, open source natural language parsing sys-tem, implemented in Prolog, and designed for practical use in language engineering (LE) applications. SUPPLE can be run as a stand-alone application, or as a compo-nent within the GATE General Architec-ture for Text Engineering. SUPPLE is dis-tributed with an example g...
Conference Paper
Full-text available
We describe SUPPLE, a freely-available, open source natural language parsing system, implemented in Prolog, and designed for practical use in language engineering (LE) applications. SUPPLE can be run as a stand-alone application, or as a component within the GATE General Architecture for Text Engineering. SUPPLE is distributed with an example gramm...
Article
We propose a general approach for performing event coreference nd for constructing complex event representations, such as those required for information extraction tasks. Our approach is based on a representation which allows a tight coupling between world or conceptual modelling and discourse modelling. The rep- resentation and the coreference mec...
Article
We describe the use of coreference chains for the production of text summaries, using a variety of criteria to select a 'best' chain to represent the main topic of a text. The approach has been implemented within an existing MUC coreference system, which constructs a full discourse model of texts, including information about changes of focus, which...
Article
Full-text available
We present an approach to anaphora resolution based on a focusing algorithm, and implemented within an existing MUC (Message Understanding Conference) Information Extraction system, allowing quantitative evaluation against a substantial corpus of annotated real-world texts. Extensions to the basic focusing mechanism can be easily tested, resulting...
Article
reuse. But the pressure towards theoretical diversity means that there is no point attempting to gain agreement, in the short term, on what set of component technologies should be developed or on the informational content or syntax of representations that these components should require or produce. Our response has been to design and implement a so...
Article
We classify and review current approaches to software infrastructure for research, development and delivery of NLP systems.
Article
We describe GATE, the General Architecture for Text Engineering, an integrated visual development environment to support the visual assembly, execution and analysis of modular natural language processing systems. The visual model is an executable data flow program graph, automatically synthesised from data dependency declarations of language proces...
Article
Full-text available
this article wewere largely successful with all slots except for the Entity Descriptor slot where scores were 50 # precision and 21 # recall. We will #rst explain the particular items we failed on, and then discuss why our Entity Descriptor slots were so poor
Article
Full-text available
This document describes LaSIE version 2.1 as embedded within GATE version 1.5.1. It is a technical specication document, not a user guide. It documents each of the modules in LaSIE in terms of: a functional overview, its interface to GATE (annotation input from or output to the GATE document manager), external processes the module may invoke, exter...
Article
Algorithms for performing coreference resolution can only be precisely evaluated given a benchmark corpus of coreference-annotated texts, together with techniques for evaluating the algorithms' output against the corpus. Such a corpus and such techniques have become available for the first time as part of the Message Understanding Conference 6 (MUC...
Article
this paper we present results about an evaluation of both approaches on real-world texts to determine the main drawbacks and advantages of each. We see this as a first step towards answering the question: does adding a notion of focus to a simple coreference mechanism buy you anything?
Article
We describe a technique for the robust interpretation of newswire texts which uses semantic role information about verb complements together with a general co-reference mechanism to extend the constituent structure analysis produced by a partial parser. This technique has the advantage that failure to find a spanning parse of an entire sentence doe...
Article
Full-text available
This paper describes a simple Prolog-based knowledge representation language called XI -- X for cross-classification and I for inheritance -- which is designed to represent knowledge about individuals, about classes of individuals, and about inclusion relations between classes of individuals. XI allows for straightforward definitions of cross-class...
Article
In this paper we describe the application of automatic terminology recognition and classification techniques for two bioinformatics projects: extraction of information about enzymes and metabolic pathways and extraction of information about protein structure, in both cases from scientific journal papers. The techniques we use were adapted from alre...
Article
With the explosive growth of scientific literature in the area of molecular biology, the need to automatically process and extract information from on-line text sources has become increasingly important. In this paper we consider the application of Information Extraction (IE) technology to the extraction of factual information from biological journ...
Article
this article we were largely successful with all slots except for the Entity Descriptor slot where scores were 50 % precision and 21 % recall. We will first explain the particular items we failed on, and then discuss why our Entity Descriptor slots were so poor
Article
this paper concentrates on describing the modifications made to our existing information extraction system to allow it to participate in the Q & A task.
Article
Full-text available
this paper concentrates on describing the modifications made to our existing information extraction system to allow it to participate in the Q & A task.
Article
We describe an approach to finding literal answer strings to natural language questions in large text collections. The approach involves linking an IR system with an NLP system that performs reasonably thorough linguistic analysis. The IR system treats the question as a query and returns a set of top ranked documents or passages. The NLP system par...
Article
The benefits of the effective creation of Information Extraction (IE) in the last ten years, driven by the ARPA TIPSTER programme and the associated MUC evaluations, have been enormous, but it must now be time to ask what research issues face the systems we have built and what we should do next. We suggest that there are two classes of important re...
Article
Information extraction technology, as defined and developed through the US Defense Advanced Research Projects Agency (DARPA) Message Understanding Conferences (MUCs), has proved successful at extracting information primarily from newswire texts and primarily in domains concerned with human activity. In this paper, the application of this technology...
Article
Information extraction technology, as defined and developed through the U.S. DARPA Message Understanding Conferences (MUCs), has proved successful at extracting information primarily from newswire texts and primarily in domains concerned with human activity. In this paper we consider the application of this technology to the extraction of informati...
Article
GATE, the General Architecture for Text Engineering, aims to provide a software infrastructure for researchers and developers working in the area of natural language processing. A version of GATE has now been widely available for three years. In this paper we review the objectives which motivated the creation of GATE and the functionality and desig...
Article
Full-text available
The volume of electronic text in different languages, particularly on the World Wide Web, is growing significantly, and the problem of users who are restricted in the number of languages they read obtaining information from this text is becoming more widespread. This paper investigates some of the issues involved in achieving multilingual Informati...
Conference Paper
The system entered by the University of Sheffield in the question answering track of TREC-8 is the result of coupling two existing technologies - information retrieval (IR) and information extraction (IE). In essence the approach is this: the IR system treats the question as a query and returns a set of top ranked documents or passages; the IE syst...
Article
We classify and review current approaches to software infrastructure for research, development and delivery of NLP systems.
Article
Full-text available
We classify and review current approaches to software infrastructure for research, development and delivery of NLP systems. The task is motivated by a discussion of current trends in the field of NLP and Language Engineering. We describe a system called GATE (a General Architecture for Text Engineering) that provides a software infrastructure on to...
Article
We describe a software environment to support research and development in natural language (NL) engineering. This environment -- GATE (General Architecture for Text Engineering) -- aims to advance research in the area of machine processing of natural languages by providing a software infrastructure on top of which heterogeneous NL component modules...
Article
We propose a general approach for performing event coreference and for constructing complex event representations, such as those required for information extraction tasks. Our approach is based on a representation which allows a tight coupling between world or conceptual modelling and discourse modelling. The representation and the coreference mech...
Article
Algorithms for performing coreference resolution can only be precisely evaluated given a benchmark corpus of coreference-annotated texts, together with techniques for evaluating the algorithms' output against the corpus. Such a corpus and such techniques have become available for the first time as part of the Message Understanding Conference 6 (MUC...
Article
We present in this paper the coreference mechanism implemented in the M-LaS1E system, a prototype multilingual Information Extraction (IE) system. We describe an experiment in which texts from a parallel French/English corpus were marked up manually and processed by the system following the MUC coreference annotation scheme. This experiment allows...
Article
this document or its parent collection using the API defined by the Tipster people. The full specification of this API is given in [6] which is available from Sheffield's ftp site as
Article
Full-text available
We describe a software environment to support research and development in natural language (NL) engineering. This environment - GATE (General Architecture for Text Engineering) - aims to advance research in the area of machine processing of natural languages by providing a software infrastructure on top of which heterogeneous NL component modules m...
Article
Full-text available
We describe GGI, a visual system that allows the user to execute an automatically generated data flow graph containing code modules that perform natural language processing tasks. These code modules operate on text documents. GGI has a suite of text visualisation tools that allows the user useful views of the annotation data that is produced by the...
Article
Full-text available
Building on the work of the TIPSTER architecture group, the University of Sheffield Natural Language Processing group have developed GATE, a General Architecture for Text Engineering. GATE implements the TIPSTER document manager, and adds a rich set of graphical tools for the management of modules and the data they produce and consume, and the visu...
Article
Full-text available
We describe GGI, a visual system that allows the user to execute an automatically generated data flow graph containing code modules that perform natural language processing tasks. These code modules operate on text documents. GGI has a suite of text visualization tools that allows the user useful views of the annotation data that is produced by the...
Article
Full-text available
this article. The original poor result is due to the failure to identify the names `McCann-Erickson' and `J. Walter Thompson' as company names. The use of the verb hire in the following piece of text
Article
Full-text available
this paper at the April 1996 TIPSTER workshop, and for extensive comments during the preparation of this paper. References
Article
This paper describes the approach to knowledge representation taken in the LaSIE information extraction (IE) system. Unlike many IE systems that skim texts and use large collections of shallow, domain-specific patterns and heuristics to fill in templates, LaSIE attempts a fuller text analysis, first translating individual sentences to a quasi-logic...
Conference Paper
Full-text available
Appointment scheduling is a problem faced daily by many individuals and organizations. Cooperating agent systems have been developed to partially automate this task. In order to extend the circle of participants as far as possible we advocate the use ...
Conference Paper
Given an information extraction (IE) system that performs an extraction task against texts in one language, it is natural to consider how to modify the system to perform the same task against texts in a different language. More generally, there may be a requirement to do the extraction task against texts in an arbitrary number of different language...
Conference Paper
Full-text available
We describe a software environment to support research and development in natural language (NL) engineering. This environment-GATE (General Architecture for Text Engineering)-aims to advance research in the area of machine processing of natural languages by providing a software infrastructure on top of which heterogeneous NL component modules may b...
Article
this article. The original poor result is due to the failure to identify the names `McCann-Erickson' and `J. Walter Thompson' as company names. The use of the verb hire in the following piece of text
Article
Full-text available
The LaSIE (Large Scale Information Extraction) system has been developed at the University of Sheffield as part of an ongoing research effort into information extraction and, more generally, natural language engineering.
Conference Paper
this article.The original poor result is due to the failure to identify the names `McCann-Erickson" and `J. WalterThompson" as company names. The use of the verb hire in the following piece of text
Article
Full-text available
e engineering . LaSIE is a single, integrated system that builds up a unified model of a text which is then used t o produce outputs for all four of the MUC-6 tasks . Of course this model may also be used for other purpose s aside from MUC-6 results generation, for example we currently generate natural language summaries of th e MUC-6 scenario resu...

Citations

... A non-TREC system-the START (SynTactic Analysis using Reversible Transformations) QA system-uses another semantic representation called ternary expressions to contribute different semantic information to plain texts so that they are more machine understandable (Katz, 1997). PropBank's (Kingsbury, Palmer, & Marcus, 2002) predicateargument structure (PAS) also has been exploited in different attempts to enhance the semantic representation of texts (Humphreys, Gaizauskas, Hepple, & Sanderson, 1999;Kawahara, Kaji, & Kurohashi, 2002). A semantic QA system performs differently depending on three aspects: a) the level of semantic representation of texts, b) different semantic techniques for answer processing, and c) the mechanisms for fusing answer lists retrieved by a semantic model with those extracted by other types of answer processing. ...
... Many commercial robots are based on this architecture. Some have four wheels and are very close to car architecture [9]. Others adopt only three wheels [10] for permanent stability or a high number of wheels for improving driveability. ...
... It is formally defined as Text Mining is a non-traditional information retrieval (IR) method whose goal is to reduce the effort required of users to obtain useful information from large computerized text data sources. Traditional Information Retrieval often simultaneously retrieves both "too little" information and "too much" text [16,17]. However, in Information Retrieval (otherwise known as Information Access), no genuinely new information is found. ...
... MALE, ADULT] and " cucumber " as [NON-HUMAN, VEGETABLE], etc. A number of projects in this paradigm have been reported in the past decade, including Basili et al. (1997), Lowe et al. (1997), Lua (1997), Humphreys et al (1999), Demetriou and Atwell (2001) ...
... If necessary, the user is requested for clarification of the question before the analysis. Various systems provide various techniques of analysis, from extracting a set of keywords, expanding this set using synonyms and morphological variants [24] to partial parsing [22] and employing a hierarchy of questions [18]. Although it is not very frequent, a user model that may contain e. g. user preferences can also be used. ...
... [2] proposed a name identification system called FASTUS. LASIE by [7] and LASIE II by [6] used the concept of a look-up dictionary and grammatical rules to identify the NEs. The main attraction of the ML approach is that it is trainable and can be adapted to different domains. ...
... The system is able to recognize phrases, recognize patterns and merge the incidents. Gaizauskas et al [11] had developed a system called LaSIE which performs lexical, syntactic and semantic analysis to build a single, rich representation of text. ...
... This paper contributes to that body of research, and the literature, by presenting a methodology for "topic modeling" that identifies patterns in the messaging. The paper also contributes to the literature (e.g., Rodgers et al. 1997;Berinato 2016;Liu et al. 2017;Mohammad 2020) on data visualizations for extracting the hidden story and the resulting sentiment from the topic modeling that can be used for marketing campaigns; in this case, it was political campaigns. ...
... Text-mining is a complex task and requires integration of various resources (e.g., terminologies) and reuse of specialized technologies (e.g., solutions for the syntactical analysis of sentences). Integration of these resources has led to the development of standalone solutions that are delivered to scientists (Friedman et al., 2001) or to complex modular solutions that require installation of components like programming language libraries ( Gaizauskas et al., 1996). Furthermore, such solutions have an architecture that does not support well integration of bioinformatics services similar to open IT solutions such as Taverna ( Hull et al., 2006). ...
... There are numerous implementations of parsers available that can be readily used: SUPPLE [14], RASP [15], MaltParser [16], TurboParser [17], Stanford RNN Parser [18] and YaraParser [8]. Most parsers can be executed as stand-alone tools or as integrated components of more complex NLP frameworks such as GATE [8]. ...