Conference Paper

Unknown Word Guessing and Part-of-Speech Tagging Using Support Vector Machines.

Conference: Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium, November 27-30, 2001, Hitotsubashi Memorial Hall, National Center of Sciences, Tokyo, Japan
Source: DBLP

ABSTRACT The accuracy of part-of-speech (POS) tagging for unknown words is substantially lower than that for known words. Considering the high accuracy rate of up-to-date statis- tical POS taggers, unknown words account for a non-negligible portion of the errors. This paper describes POS prediction for unknown words using Support Vector Machines. We achieve high accuracy in POS tag prediction using substrings and surrounding context as the features. Furthermore, we integrate this method with a practical English POS tagger, and achieve accuracy of 97.1%, higher than conventional approaches.

  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Part of Speech (POS) is a very vital topic in Natural Language Processing (NLP) task in any language, which involves analysing the construction of the language, behaviours and the dynamics of the language, the knowledge that could be utilized in computational linguistics analysis and automation applications. In this context, dealing with unknown words (words do not appear in the lexicon referred as unknown words) is also an important task, since growing NLP systems are used in more and more new applications. One aid of predicting lexical categories of unknown words is the use of syntactical knowledge of the language. The distinction between open class words and closed class words together with syntactical features of the language used in this research to predict lexical categories of unknown words in the tagging process. An experiment is performed to investigate the ability of the approach to parse unknown words using syntactical knowledge without human intervention. This experiment shows that the performance of the tagging process is enhanced when word class distinction is used together with syntactic rules to parse sentences containing unknown words in Sinhala language.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a semantic setup for Dutch on the basis of deep processing. The parser and generator Delilah computes a system of logical forms that is both semantically ade-quate, and instrumental in processing tasks like disambiguation and inference. The logical forms are derivationally related but differ as to the level of specification and exploitability. The semantic setup is new, and is likely to be the first computed, fully specified seman-tics for Dutch. One of the logical forms introduces a new way of compiling out semantic dependencies. The resulting system is discussed at the crossroad of logical semantics and computational linguistics.

Full-text (2 Sources)

Available from
Feb 7, 2015