Conference Paper

Application of Multilayer Perceptron Network for Tagging Parts-of-Speech.

Dept. of Comput. Sci. & Eng., Muffakham Jah Coll. of Eng. & Technol., Hyderabad, India
DOI: 10.1109/LEC.2002.1182291 Conference: 2002 Language Engineering Conference (LEC 2002), 13-15 December 2002, Hyderabad, India
Source: DBLP

ABSTRACT This paper presents a neural network based part-of-speech tagger that learns to assign correct part-of-speech tags to the words in a sentence. A multilayer perceptron (MLP) network with three-layers is used. The MLP-tagger is trained with error back-propagation learning algorithm. The representation scheme for the input and output of the network is adapted from Ma et al. (1966). The tagger is trained on SUSANNE English tagged-corpus consisting of 156,622 words. The MLP-tagger is trained using 85% of the corpus. Based on the tag mappings learned, the MLP-tagger demonstrated an accuracy of 90.04% on test data that also included words unseen during the training. Results from our experiments suggest that the MLP-tagger combined with the representation scheme adopted here could be a better substitute for traditional tagging approaches. This method shows promise for addressing parts-of-speech tagging problem for Indian language text considering the fact that most of the Indian language corpora, especially tagged ones, are still considerably small in size.

0 Followers
 · 
302 Views
  • Source
    • "Section 3 introduces the Tamil language and the general morphophonemic concepts related to this topic. Description of the machine learning techniques used in this paper is given in Section 4. The implementation and the results of the machine learning models are discussed in Section 5. [2] RELATED WORKS "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents machine learning solutions to a practical problem of Natural Language Generation (NLG), particularly the word formation in agglutinative languages like Tamil, in a supervised manner. The morphological generator is an important component of Natural Language Processing in Artificial Intelligence. It generates word forms given a root and affixes. The morphophonemic changes like addition, deletion, alternation etc., occur when two or more morphemes or words joined together. The Sandhi rules should be explicitly specified in the rule based morphological analyzers and generators. In machine learning framework, these rules can be learned automatically by the system from the training samples and subsequently be applied for new inputs. In this paper we proposed the machine learning models which learn the morphophonemic rules for noun declensions from the given training data. These models are trained to learn sandhi rules using various learning algorithms and the performance of those algorithms are presented. From this we conclude that machine learning of morphological processing such as word form generation can be successfully learned in a supervised manner, without explicit description of rules. The performance of Decision trees and Bayesian machine learning algorithms on noun declensions are discussed.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Support vector machines (SVMs) and related kernel methods have become widely known tools for text mining tasks such as classification and regression. The Arabic part of speech (POS) based support vectors machine is designed and implemented. The NeuroSolutions software is used to adopt and learn the proposed tagger. The radial basis functions (RBFs) is used as a linear function approximator. The experiments has give an evinced that the SVMs tagger is accurate of (99.99%), has low processing time, and use a little a mount of data at training phase.
    Information Technology, 2008. ITSim 2008. International Symposium on; 09/2008
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we describe our recent advances on a novel approach to Part-Of-Speech tagging based on neural networks. Multilayer perceptrons are used following corpus-based learning from contextual, lexical and morphological information. The Penn Treebank corpus has been used for the training and evaluation of the tagging system. The results show that the connectionist approach is feasible and comparable with other approaches.
    Current Topics in Artificial Intelligence, 13th Conference of the Spanish Association for Artificial Intelligence, CAEPIA 2009, Seville, Spain, November 9-13, 2009. Selected Papers; 01/2009
Show more