Conference Paper

Unknown Word Guessing and Part-of-Speech Tagging Using Support Vector Machines.

Conference: Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium, November 27-30, 2001, Hitotsubashi Memorial Hall, National Center of Sciences, Tokyo, Japan
Source: DBLP

ABSTRACT The accuracy of part-of-speech (POS) tagging for unknown words is substantially lower than that for known words. Considering the high accuracy rate of up-to-date statis- tical POS taggers, unknown words account for a non-negligible portion of the errors. This paper describes POS prediction for unknown words using Support Vector Machines. We achieve high accuracy in POS tag prediction using substrings and surrounding context as the features. Furthermore, we integrate this method with a practical English POS tagger, and achieve accuracy of 97.1%, higher than conventional approaches.

0 Bookmarks
 · 
87 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a semantic setup for Dutch on the basis of deep processing. The parser and generator Delilah computes a system of logical forms that is both semantically ade-quate, and instrumental in processing tasks like disambiguation and inference. The logical forms are derivationally related but differ as to the level of specification and exploitability. The semantic setup is new, and is likely to be the first computed, fully specified seman-tics for Dutch. One of the logical forms introduces a new way of compiling out semantic dependencies. The resulting system is discussed at the crossroad of logical semantics and computational linguistics.
  • Source
    Tijdschrift Voor Geschiedenis. 01/2008;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: All types of part-of-speech (POS) tagging errors have been equally treated by existing taggers. However, the errors are not equally important, since some errors affect the performance of subsequent natural language processing (NLP) tasks seriously while others do not. This paper aims to minimize these serious errors while retaining the overall performance of POS tagging. Two gradient loss functions are proposed to reflect the different types of errors. They are designed to assign a larger cost to serious errors and a smaller one to minor errors. Through a set of POS tagging experiments, it is shown that the classifier trained with the proposed loss functions reduces serious errors compared to state-of-the-art POS taggers. In addition, the experimental result on text chunking shows that fewer serious errors help to improve the performance of subsequent NLP tasks.
    Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1; 07/2012

Full-text

Download
1 Download
Available from