Article
To read the full-text of this research, you can request a copy directly from the author.

Abstract

This paper brings out the various techniques we have followed to build a production ready scalable classifier system. This system is used to classify the tickets raised by employees of an organization. The end users raise the tickets in Natural language which is then automatically classified by the classifier. This is a practical industry applied research paper in the area of machine learning. We have applied different machine learning and natural language processing techniques like active learning and random under sampling. The application of these techniques has helped in improving the accuracy of the prediction. We have used clustering for handling the data issues found in the training data. The approach we used for the core classifier combined the results of multiple machine learning algorithms using suitable scoring techniques. The overall solution architecture used ensured the meeting of production grade software system attributes of reliability, availability and scalability.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Text categorization is the task of assigning text or documents into pre-specified classes or categories. For an improved classification of documents text-based learning needs to understand the context, like humans can decide the relevance of a text through the context associated with it, thus it is required to incorporate the context information with the text in machine learning for better classification accuracy. This can be achieved by using semantic information like part-of-speech tagging associated with the text. Thus the aim of this experimentation is to utilize this semantic information to select features which may provide better classification results. Different datasets are constructed with each different collection of features to gain an understanding about what is the best representation for text data depending on different types of classifiers
Conference Paper
Full-text available
This paper presents a novel adaptive synthetic (ADASYN) sampling approach for learning from imbalanced data sets. The essential idea of ADASYN is to use a weighted distribution for different minority class examples according to their level of difficulty in learning, where more synthetic data is generated for minority class examples that are harder to learn compared to those minority examples that are easier to learn. As a result, the ADASYN approach improves learning with respect to the data distributions in two ways: (1) reducing the bias introduced by the class imbalance, and (2) adaptively shifting the classification decision boundary toward the difficult examples. Simulation analyses on several machine learning data sets show the effectiveness of this method across five evaluation metrics.
Article
This paper deals with the functionality of a research program complex for processing and analysis of unstructured text data. It provides a thorough description of the implemented algorithms and experimental results obtained in various text samples. With the recent disclose of PRISM and similar governmental data surveillance programs, intelligent surveillance of the open source data that circle the World Wide Web is gaining a special importance. Our results might be used for improving the speed and the mode of the unstructured text data analysis and contribute to tackling the new threats that include international terrorism and espionage (including cyber-espionage) in the globalized world.
Conference Paper
We present an active learning scheme that exploits cluster structure in data.
Conference Paper
Methods that create several classiers out of one base clas- sier, so-called ensemble creation methods, have been proposed and suc- cessfully applied to many classication problems recently. One category of such methods is Boosting with AdaBoost being the best known pro- cedure belonging to this category. Boosting algorithms were rst devel- oped for two-class problems, but then extended to deal with multiple classes. Yet these extensions are not always suitable for problems with a large number of classes. In this paper we introduce some novel boost- ing algorithms which are designed for such problems, and we test their performance in a handwritten word recognition task.
Introduction to Semi-Supervised Learning by
Introduction to Semi-Supervised Learning by Xiaojin Zhu and Andrew B. Goldberg From the SYNTHESIS LECTURES ON ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING #6, 2009
  • Shivali Akula
  • Shripad J Agarwal
  • Nadgowda
Akula, Shivali Agarwal, Shripad J. Nadgowda. Service- Oriented Computing -12th International Conference, ICSOC 2014, Paris, France, November 3-6, 2014. Proceedings. Volume 8831 of Lecture Notes in Computer Science, pages 478-485, Springer, 2014
Classification of Historical Notary Acts with Noisy Labels by Julia Efremova, Alejandro Montes García, Toon Calders in Advances in Information Retrieval
Classification of Historical Notary Acts with Noisy Labels by Julia Efremova, Alejandro Montes García, Toon Calders in Advances in Information Retrieval. Lecture Notes in Computer Science Volume 9022, 2015, pp 49-54
Handling imbalanced data -A review by Sotiris Kotsiantis and
Handling imbalanced data -A review by Sotiris Kotsiantis and et al. GESTS International Transactions on Computer Science and Engineering, Vol.30, 2006
doi:10.1007/3-540-44938-8_33 [4] Introduction to Semi-Supervised Learning by
  • Classifier Multiple
  • Systems
Multiple Classifier Systems. doi:10.1007/3-540-44938-8_33 [4] Introduction to Semi-Supervised Learning by Xiaojin Zhu and Andrew B. Goldberg From the SYNTHESIS LECTURES ON ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING #6, 2009
  • Shivali Akula
  • Agarwal
  • J Shripad
  • Nadgowda
Akula, Shivali Agarwal, Shripad J. Nadgowda. Service-Oriented Computing -12th International Conference, ICSOC 2014, Paris, France, November 3-6, 2014. Proceedings. Volume 8831 of Lecture Notes in Computer Science, pages 478-485, Springer, 2014
Data Algorithms for Processing and Analysis of Unstructured Text Documents by Artem Borodkin
Data Algorithms for Processing and Analysis of Unstructured Text Documents by Artem Borodkin, Evgeny Lisin, Wadim Strielkowski Applied Mathematical Sciences, Vol. 8, 2014, no. 25, 1213 -1222
Classifying Unstructured Text Using Structured Training Instances and an Ensemble of Classifiers Andreas Lianos, Yanyan Yang
Classifying Unstructured Text Using Structured Training Instances and an Ensemble of Classifiers Andreas Lianos, Yanyan Yang. Journal of Intelligent Learning Systems and Applications, 2015, 7, 58-73 Published Online May 2015 in SciRes. http://www.scirp.org/journal/jilsa http://dx.doi.org/10.4236/jilsa.2015.7200
Feature Selection for Effective Text Classification using Semantic Information by Rahul Jain and Nitin Pise
Feature Selection for Effective Text Classification using Semantic Information by Rahul Jain and Nitin Pise, International Journal of Computer Applications (0975 -8887) Volume 113 -No. 10, March 2015 18
Effective Text Classification by a Supervised Feature Selection Approach
Effective Text Classification by a Supervised Feature Selection Approach. Basu, T. ; Machine Intell. Unit, Indian Stat. Inst., Kolkata, India; Murthy, C.A. Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference International Journal of Digital Information and Wireless Communications (IJDIWC) 6(1): 1-15