
Shikhar Kr. Sarma- Ph.D.
- Professor at Gauhati University
Shikhar Kr. Sarma
- Ph.D.
- Professor at Gauhati University
About
72
Publications
54,057
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
520
Citations
Introduction
NLP, Language Technology in Assamese, Bodo
Current institution
Additional affiliations
January 2008 - present
Publications
Publications (72)
This paper proposes an approach to introduce a text summarization in the Assamese language as it has been observed that text summarization is an important natural language processing application developed to condense the original or main text by retrieving the relevant information while retaining the main matter of the text. In the proposed approac...
Social media users and online news portals are rising exponentially in the Northeastern state of India, where every small instance of daily life are posted on social media platforms such as Facebook, Twitter by the users in their native language. Every social media user post about their daily experiences on such kind of social media platform which...
The popularity of mobile devices and the availability of the Internet increase the use of various online platforms for chatting and communicating with others. Due to the use of such platforms, the use of local languages is also increasing because everyone feels comfortable with his/her mother tongue. In this research work, clustering of Assamese wo...
IEEE 802.11 Wireless LAN, popularly known as WiFi, has become the admired source of internet connectivity for most of the offices as well as organizations. Due to the rapid growth of multimedia data and VoIP and also to provide better quality of service (QoS), bandwidth and management of bandwidth have become important factors in 802.11 wireless LA...
In this paper, an extractive text summarization approach using Assamese WordNet is proposed, and the difficulties faced while extracting summary in the Assamese document are discussed. The Assamese language is a low-level language. Synset is applied from Assamese WordNet. The various features used for identifying the most salient sentences to gener...
In this review article authors are trying to put light on text to speech synthesis of Assamese language using unit selection concatenative speech synthesis technique. Assamese is one of the North East Indian languages spoken by millions of people. This article tries to highlight some major difficulties when developing the synthesizer. The speech un...
It is very common to miss certain words when writing while listening to others. A similar problem can arise when typing on the computer. The automatic generation of missed words shall very much helpful for users by suggesting the required words. In this research work, missed words of the Assamese sentences are generated, at present, there is no suc...
WordNets have been used in a wide variety of applications, including in design and devel- opment of intelligent and human assisting sys- tems. Although WordNet was initially devel- oped as an online lexical database, (Miller, 1995 and Fellbaum, 1998) later developments have inspired using WordNet database as re- sources in NLP applications, Languag...
The present day scenerio demands for using and adopting the Green Computer, Green Computing and as such Green Computer Network. Many industries already initiate tasks to attain the the green soft-computing and green networking in the line of eco-friendly technology so as to help the ecosysytem and thus can make an impact in the biological sciences....
A natural language or an everyday language is an accustomed form of communication used by the people to speak, express and write. Besides, these languages are called natural because they are evolved naturally among the communities. Natural Language Processing is a very vital field in connection with Artificial Intelligence, where research has expon...
With the rapid growth of ICT, almost all the entities exist in this world are moving towards computing enabled digital space. At present, the computing infrastructure is not only an infrastructure for scientific computation and communication, but also it has been considered as a platform for computation of anything appearing at anytime and anywhere...
Text summarization is the task of condensing the input text documents into a shorter version by retaining the overall meaning and information of the original document. Though plenty of text compaction research works in English and other languages have been done till date, but text compaction in Assamese language is still lagging behind due to the l...
A statistical machine translation system for Assamese and English language pair. It is developed using Marie.
The demand of Machine Translation (MT) is increasing due to the increased rate of exchange of information around the globe. Considering Internet as the main channel of information sharing, the source of information is not confined to a specific geographical location and a specific language. MT is the way of translating from one language to another...
Word prediction is a technique which try to suggest the word by observing the previous input letters or words in any text editor. At present there is no such software or tool in Assamese which can predict the future word(s) of a sentence. This method helps the people who are not very much expert in typing and this research aims to reduce the gap be...
Analyzing morphology of a word is a crucial task and may be varied based on Language Grammar for different languages. Assamese is a language spoken by the people of Assam, the northeastern part of country India, located in south of the eastern Himalayas. Assamese is the major language spoken in Assam and it is served as a bridge language among diff...
Prosody is a term related with literature as well as speech technology. It is one of the primary parts for design of any natural sounding text to speech synthesis. Prosody is a broad as well as complex way of expressing meaning of a text segment in terms of pitch means fundamental frequency of utterance, loudness or intensity, intonation, and rhyth...
This paper deals with a major issue for designing a Text-to-Speech synthesizer. To design a speech synthesizer, we need speech prosody where all significant and important utterance-related information are systematically stored. An utterance can be divided into the segmental level as well as suprasegmental level. Suprasegmental level deals with syll...
The most important factor to build a synthesized discourse close to human speech, it is important that the content preparing segment delivers a suitable arrangement of phonemic units relating to an the text given as input. A detailed experimental research has been done throughout the project to syllabify Assamese words by using existing Phonetic Al...
This paper presents the design and development of an expert system that aim to provide for suitable diagnosis of some of the diseases of rice plants. An expert system can be defined as a computer program that uses encoded knowledge to solve problems in a specific domain that normally requires a specialized human expert. The proposed system composed...
Sense Disambiguation (WSD) aims to disambiguate the words which have multiple sense in a context automatically. Sense denotes the meaning of a word and the words which have various meanings in a context are referred as ambiguous words. WSD is vital in many important Natural Language Processing tasks like MT, IR, TC, SP etc. This research paper atte...
Word Sense Disambiguation (WSD) is the process ofidentifying the proper sense of an ambiguous word depending onthe particular context. It is to find the accurate sense si among theset of senses {s1, s2, , sn}. This task was motivated by itsinterpretation in various Natural Language Processing (NLP) applications like IR, MT, QA, TC, SP etc. In this...
Searching a document from the huge collection all over the internet is becoming a challenge. Like other languages Bodo language also providing content to the electronic world. Bodo is widely used in the North Eastern states of India. As text documents are increasing exponentially across the web, grouping similar documents for versatile applications...
Information or knowledge contained by the texts is structured in a language specific syntactic form. They
are neither understood nor processed by the computers. They must be organized in a structured form.
Structured representation of sentences written in a particular language enables a computer to have a good
understanding of the knowledge they co...
Morphological Analysis is an important branch of linguistics for any Natural
Language Processing Technology. Morphology studies the word structure and
formation of word of a language. In current scenario of NLP research,
morphological analysis techniques have become more popular day by day. For
processing any language, morphology of the word should...
This work primarily aim on different aspects of designing a spell checker for the Assamese language and integrating it as an add-on into Open Office Writer. Besides emphasizing on error detection and suggestion generation the programming model and challenges of developing the add-on for the aforementioned application has also been discussed. The sy...
Machine translation is the process of translating text from one language to
another. In this paper, Statistical Machine Translation is done on Assamese and
English language by taking their respective parallel corpus. A statistical
phrase based translation toolkit Moses is used here. To develop the language
model and to align the words we used two a...
The IEEE 802.11 WLAN is primarily used for web browsing which belongs to the category of non-real time application. But the demand of real time applications like VOIP and video conferencing has become very much common to such WLAN. With IEEE 802.11e Mac protocol it is possible to improve the QoS for both real and non-real time traffic by service di...
Machine Translation is a task to translate the text from a source language to a target language in an automatic manner. Here, we describe a system that translate the English language to Assamese language text which is based on Phrase based statistical translation technique. To overcome the translation problem related with highly open word class lik...
The paper defines the term electrical noise with its types. Electromagnetic Interference (EMI), which is one type of electrical noise, is also defined and general techniques used for controlling EMI are described. Networking cables are affected by the EMI effect caused by a nearby power cable and data transmission through Unshielded Twisted Pair (U...
The increasing rate of Assamese text contents in digital format encourages us to generate a system that automatically categorizes them. This paper discusses a system that will perform the categorization of texts automatically based on the knowledge from Assamese WordNet. In WordNet, synset correspond to the words which implies the same concept and...
Multiword Expressions (MWEs) are sequence of words separated by space or delimiter which determines a unique meaning instead of words' individual meanings. Our work concentrates on automatic identification of MWEs for two less computationally aware languages Assamese and Bodo spoken in the North Eastern part of India. Statistical measure and Langua...
The performance of automatic speaker recognition (ASR) system degrades drastically in the presence of noise and other distortions, especially when there is a noise level mismatch between the training and testing environments. This paper explores the problem of speaker recognition in noisy conditions, assuming that speech signals are corrupted by no...
The objective of the paper is to give an idea about the copper based networking cables and their important characteristics. Copper based cables, specially UTP cables are very sensitive to EMI and optical fiber are insensitive to EMI. But still today, UTP cable is the most popularly used networking cable supporting the standards up to Ten Gigabit Et...
Assamese is one of the regional languages of India spoken by the people of Assam and other north eastern states of India. Parts Of Speech (POS) tagging is one of the most important research issue as it is the basic need for any Natural Language Processing (NLP). An automated way to provide a Parts Of Speech label to a word on a context is known as...
Extracting the users expected information from a large text collection based on some query is the aim of a Information Retrieval (IR) system. Now a days Assamese Digital documents are increasing at a huge rate and to collect the information efficiently from them we are in need of an Assamese IR system for retrieving documents. Comparing query and d...
The objective of the paper is to outline the current trend of cabling in the networking world. Today, UTP cable (CAT5e, CAT6) is the most popularly used cable to provide Gigabit Ethernet Networking despite of some disadvantages compared to Fiber Optic cable, which offers more signal reliability. The paper tries to give an overview of all categories...
Data transmission through UTP cabling system is effected by EMI from a nearby Power line through the coupling mechanisms. Today, though UTP cable is the most preferred cabling supporting 10G Ethernet, but it is also the mostly influenced cable by EMI, since it is unshielded. Shielding and Physical Separation are the two most effective methods to av...
An expert system is computer program composed of knowledge base, inference engine and user-interface. Its technical aspect involves the design and implementation of the architectural model of an expert system namely the knowledge base component, the graphical user interface component, the application component and the database. This paper presents...
IEEE802.11 WLAN is primarily used for non-real time traffics, but in recent times real-time traffics like VoIP and video conferencing have emerged as exciting and heavily used applications in such WLAN, which needs special attention to attributes like delay sensitiveness or bandwidth requirement. The IEEE 802.11e MAC protocol produces improved perf...
The objective of the paper is to study Electromagnetic Interference (EMI) produced by AC Power lines and its effect on communication/networking cables. Causes of EMI and techniques used for reduction of EMI are pointed out. The aim of the paper is to investigate and analyze the research works, standards, and studies on effect of AC Power line on UT...
The present paper deals with the design and implementation of multilingual lexical resources of Assamese and Bodo Language with the help of Hindi Wordnet. Here, we present the multilingual dictionaries (for Hindi, Assamese and Bodo), synset based word search for Assamese-Hindi and Bodo-Hindi language. These words, of course, will have to go through...
Integrating Expert System based on fuzzy logic and its inferences is a new dimension of research in e-learning environment. Standards so far do not provide suitable methods to extract learner's correct expertise level in e-learning environment. This paper depicts adaptation of expert system technology using fuzzy logic and inferences to handle the...
Although there are various applications of Expert system in various fields, right from agriculture to the diagnosis of diseases of patients, it has potential for extensive contribution in digital learning. This paper discusses and analyses the present applications of Expert System in e-learning and to see the usefulness and effectiveness of it. The...
Kinship terms form a considerable part of the Wordnet in any language. Most of the kinship terms interact each other with different relational characteristics of Wordnet. This paper explores the area of kinship terms in Assamese language, and outlines the standard kinship relations, associated set of terms in the language. The formation of such ter...
This paper presents an architectural framework of an Expert System in the area of agriculture and describes the design and development of the rule based expert system, using the shell ESTA (Expert System for Text Animation). The designed system is intended for the diagnosis of common diseases occurring in the rice plant. An Expert System is a compu...
Development of Wordnets of regional languages has been of great concern in recent years. This is mainly due to the ever increasing demands and requirements of putting those languages as effective media of the digital world, including the internet. As the technologies for putting regional languages in the digital media are being developed, research...
We have made an attempt to study the spectral characteristics of two North East Indian languages, Assamese and Boro, coming from different genres. We have taken a few words with similar, partially similar, and dissimilar characteristics in their nature of utterance from Assamese and Boro. The spectral analysis revels that both the languages have a...
This paper discusses the linguistics foundations for developing a Bodo Wordnet, describing the Bodo language characteristics and properties specific to the development of Wordnet. The characteristics of the Bodo language in terms of its morphological and syntactic structure are outlined. Important characteristics related to building of Wordnet are...
Kinship terms form a considerable part of the Wordnet in any language. Most of the kinship terms interact each other with different relational characteristics of Wordnet. This paper explores the area of kinship terms in Assamese language, and outlines the standard kinship relations, associated set of terms in the language. The formation of such ter...
In this paper, a new simplified approach has been made for the design and implementation of a noise robust speech recognition using Multilayer Perceptron (MLP) based Artificial Neural Network and LPC-Cepstral Coefficient. Cepstral matrices obtained via Linear Prediction Coefficient are chosen as the eligible features. Here, MLP neural network based...