This paper presents a neural network based part-of-speech tagger that learns to assign correct part-of-speech tags to the words in a sentence. A multilayer perceptron (MLP) network with three-layers is used. The MLP-tagger is trained with error back-propagation learning algorithm. The representation scheme for the input and output of the network is adapted from Ma et al. (1966). The tagger is trained on SUSANNE English tagged-corpus consisting of 156,622 words. The MLP-tagger is trained using 85% of the corpus. Based on the tag mappings learned, the MLP-tagger demonstrated an accuracy of 90.04% on test data that also included words unseen during the training. Results from our experiments suggest that the MLP-tagger combined with the representation scheme adopted here could be a better substitute for traditional tagging approaches. This method shows promise for addressing parts-of-speech tagging problem for Indian language text considering the fact that most of the Indian language corpora, especially tagged ones, are still considerably small in size.
"Section 3 introduces the Tamil language and the general morphophonemic concepts related to this topic. Description of the machine learning techniques used in this paper is given in Section 4. The implementation and the results of the machine learning models are discussed in Section 5.  RELATED WORKS "
[Show abstract][Hide abstract] ABSTRACT: This paper presents machine learning solutions to a practical problem of
Natural Language Generation (NLG), particularly the word formation in
agglutinative languages like Tamil, in a supervised manner. The morphological
generator is an important component of Natural Language Processing in
Artificial Intelligence. It generates word forms given a root and affixes. The
morphophonemic changes like addition, deletion, alternation etc., occur when
two or more morphemes or words joined together. The Sandhi rules should be
explicitly specified in the rule based morphological analyzers and generators.
In machine learning framework, these rules can be learned automatically by the
system from the training samples and subsequently be applied for new inputs. In
this paper we proposed the machine learning models which learn the
morphophonemic rules for noun declensions from the given training data. These
models are trained to learn sandhi rules using various learning algorithms and
the performance of those algorithms are presented. From this we conclude that
machine learning of morphological processing such as word form generation can
be successfully learned in a supervised manner, without explicit description of
rules. The performance of Decision trees and Bayesian machine learning
algorithms on noun declensions are discussed.
[Show abstract][Hide abstract] ABSTRACT: Support vector machines (SVMs) and related kernel methods have become widely known tools for text mining tasks such as classification and regression. The Arabic part of speech (POS) based support vectors machine is designed and implemented. The NeuroSolutions software is used to adopt and learn the proposed tagger. The radial basis functions (RBFs) is used as a linear function approximator. The experiments has give an evinced that the SVMs tagger is accurate of (99.99%), has low processing time, and use a little a mount of data at training phase.
Information Technology, 2008. ITSim 2008. International Symposium on; 09/2008
[Show abstract][Hide abstract] ABSTRACT: In this paper, we describe our recent advances on a novel approach to Part-Of-Speech tagging based on neural networks. Multilayer perceptrons are used following corpus-based learning from contextual, lexical and morphological information. The Penn Treebank corpus has been used for the training and evaluation of the tagging system. The results show that the connectionist approach is feasible and comparable with other approaches.
Current Topics in Artificial Intelligence, 13th Conference of the Spanish Association for Artificial Intelligence, CAEPIA 2009, Seville, Spain, November 9-13, 2009. Selected Papers; 01/2009
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.