Article

Improving the lexical coverage of english compound adjectives improving the lexical coverage of english compound adjectives in syntactic parsing

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The present paper addresses the question how in syntactic parsing the coverage of words in previously unseen text may be improved. The adjectives in English are presented here as a case study. Working on the assumption that most new words that are introduced into the language are constructed on the basis of already existing words through the application of word-formation processes, we investigate the role that different word-formation processes play, more specifically in the formation of adjectives in English. An analysis of adjectives in the BNC shows that in the case of adjectives compounding is the word-formation process that is most productive. Moreover, compound adjectives are not formed by combining bases at will; rather, a limited set of fairly simple rules apply that restrict the co-occurrence of bases. This makes it feasible to develop an approach for handling compound adjectives which is rather effective, as is evident from the results from a first implementation where of a set of 30,561 compound adjectives derived from the BNC, 88.68% were correctly identified as such. Incorporation of the rules in the grammar underlying the Pelican parser accounts for a 7.65% increase in the parser's coverage of a subset of 10,123 sentences taken from the Leipzig corpus.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
The accuracy of part-of-speech (POS) tagging for unknown words is substantially lower than that for known words. Considering the high accuracy rate of up-to-date statis- tical POS taggers, unknown words account for a non-negligible portion of the errors. This paper describes POS prediction for unknown words using Support Vector Machines. We achieve high accuracy in POS tag prediction using substrings and surrounding context as the features. Furthermore, we integrate this method with a practical English POS tagger, and achieve accuracy of 97.1%, higher than conventional approaches.
Article
From spring 1990 through fall 1991, we performed a battery of small experiments to test the effectiveness of supplementing knowledge-based techniques with probabilistic models. This paper reports our experiments in predicting parts of speech of highly ambiguous words, predicting the intended interpretation of an utterance when more than one interpretation satisfies all known syntactic and semantic constraints, and learning caseframe informationfor verbsfrom example uses.From these experiments, we are convinced that probabilistic models based on annotated corpora can effectively reduce the ambiguity in processing text and can be used to acquire lexical informationfrom a corpus, by supplementing knowledge-based techniques.Based on the results of those experiments, we have constructed a new natural language system (PLUM) for extracting data from text, e.g., newswire text.