
H. L. ShashirekhaMangalore university · Department of Computer Science
H. L. Shashirekha
About
54
Publications
7,774
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
223
Citations
Introduction
Skills and Expertise
Publications
Publications (54)
The task of Language Identification (LI) in text processing refers to automatically identifying the languages used in a text document. LI task is usually been studied at the document level and often in high-resource languages while giving less importance to low-resource languages. However, with the recent advancement in technologies, in a multiling...
The task of automatically identifying a language used in a given text is called Language Identification (LI). India is a multilingual country and many Indians especially youths are comfortable with Hindi and English, in addition to their local languages. Hence, they often use more than one language to post their comments on social media. Texts cont...
Curfews and lockdowns around the world in the Covid-19 era have increased the usage of the internet drastically and accordingly the amount of data shared on social media. In addition to using social media for sharing useful information, some miscreants are using the power of social media to spread hate speech and offensive content. Filtering the of...
The amount of acronyms in texts is growing with the increase in the number of scientific articles, and it is not bound only to English texts. The Acronym Extraction (AE) task aims at automatically identifying and extracting the acronyms and their long forms in the given text. To tackle the challenge of AE in different languages, this paper describe...
Analyzing sentiments or opinions in code-mixed languages is gaining importance due to increase in the use of social media and online platforms especially during the Covid-19 pandemic. In a multilingual society like India, code-mixing and script mixing is quite common as people especially the younger generation are quite familiar in using more than...
Offensive Language Identification (OLI) in code-mixed under-resourced Dravidian languages is a challenging task due to the complex characteristics of code-mixed text and scarcity of digital resources and tools to process these languages. This paper describes the strategy proposed by our team MUCIC for the 'Dravidian-CodeMix-HASOC2021' shared task w...
Social media usually consists of various forms of toxic contents such as Hate Speech (HS) and contents in offensive and abusive languages, in addition to useful and relevant ones. The offensive contents on social media may target a religion, community, individual or group of people, with specific thoughts and beliefs. A category of offensive conten...
Identifying fake news shared on social media is a vital task due to its immense effects in a negative way on the society, community, an individual or whoever is the target. Controlling and managing the fake news shared on social media manually is an impractical task due to the increasing number of social media users, increasing volume of fake news...
Fazlourrahman, B>Aparna, B. K.Shashirekha, H. L.In view of COVID-19 outbreak, the world is facing lot of issues related to public health. Online media and platforms especially during the present pandemic have increased the popularity of many online applications and also blogs. Few people are using this opportunity for the good cause, whereas few ot...
ML and DL algorithms are becoming more popular to predict household food security status, which can be used by the governments and policymakers of the country to provide a food supply for the needy in case of emergency. ML models, namely: k-Nearest Neighbor (kNN), Logistic Regression (LR), Support Vector Machine (SVM), Naive Bayes (NB), Multi-Layer...
Spreading positive vibes or hope content on social media may help many people to get motivated in their life. To address Hope Speech detection in YouTube comments, this paper presents the description of the models submitted by our team-MUCIC, to the Hope Speech Detection for Equality, Diversity, and Inclusion (HopeEDI) shared task at Association fo...
abusive language content such as hate speech, profanity, and cyberbullying etc., which is common in online platforms is creating lot of problems to the users as well as policy makers. Hence, detection of such abusive language in user-generated online content has become increasingly important over the past few years. Online platforms strive hard to...
Automatically handling the enormous amount of text data that is being generated with mind-blowing speed is an ongoing work in text processing for various applications. Event Detection (ED) is one such application that aims to extract information about events in a given text based on the words which indicate the events. It acts as a preprocessing st...
Critical embedded control system along with its real-time software requires high reliability in its design, development, and maintenance. Failure in any critical software contributes to risks in system safety and creates hazards. Reliability is a major component of performance evaluation, and it is inversely proportional to the defects at every sta...
Complex learning approaches along with complicated and expensive features are not always the best or the only solution for Natural Language Processing (NLP) tasks. Despite huge progress and advancements in learning approaches such as Deep Learning (DL) and Transfer Learning (TL), there are many NLP tasks such as Text Classification (TC), for which...
Technological developments in healthcare industry are generating lots of electronic health records as well as text data which is usually referred as medical text data. Processing medical text data in unstructured form is not only challenging but also has lot of applications. Named entity recognition, the task of extracting named entities and classi...
Profane or abusive speech with the intention of humiliating and targeting individuals, a specific community or groups of people is called Hate Speech (HS). Identifying and blocking HS contents is only a temporary solution. Instead, developing systems that are able to detect and profile the content polluters who share HS will be a better option. In...
Social media analytics are widely being explored by researchers for various applications.
Prominent among them are identifying and blocking abusive contents especially targeting
individuals and communities, for various reasons. The increasing abusive contents and the increasing number of users on social mediademands automated tools to detect and fi...
Safety critical software is a key component of any critical system, and whenever there is a failure in this software, the system malfunctions with effect on safety of life or mission. Reliability is one of the quality factors and performance evaluator for this critical software. High reliability is expected of such software in its design, developme...
This paper describes the models submitted by the team MUCS for Offensive Language Identification in Dravidian Languages-EACL 2021 shared task that aims at identifying and classifying code-mixed texts of three language pairs namely, Kannada-English (Kn-En), Malayalam-English (Ma-En), and Tamil-English (Ta-En) into six predefined categories (5 catego...
Sentiments are usually written using a combination of languages such as English which is resource rich and regional languages such as Tamil, Kannada, Malayalam, etc. which are resource poor. However, due to technical constraints , many users prefer to pen their opinions in Roman script rather than using their native scripts. These kinds of texts wr...
This paper describes the models submitted by the team MUCS for "Hope Speech Detection for Equality, Diversity, and Inclusion-EACL 2021" shared task that aims at classifying a comment / post in English and code-mixed texts in two language pairs, namely, Tamil-English (Ta-En) and Malayalam-English (Ma-En) into one of the three predefined categories,...
21st century is named as the age of information technologies. Social applications such as Facebook, Twitter, Instagram, etc. have become a quick and huge media for spreading news over the internet. At the same time, the ability for the wide spread of news that is of low quality with intentionally false information is creating havocs causing damage...
Antisocial elements in social media take advantage of the anonymity in the cyber world and indulge in vulgar and offensive communications such as bullying, trolling, harassment etc. Many youths experiencing such victimization are reported to have psychological symptoms of anxiety, depression and loneliness. These issues have become a growing concer...
Detecting fake news from the real news can be modeled as a typical binary text classification problem. Most of the models proposed for fake news detection address the resource rich languages such as English and Spanish but, languages such as Urdu, Persian, Balouchi and many Indian native languages have received very less attention due to unavailabi...
The increasing use of social media and online shopping are generating a lot of text data that consists of sentiments or opinions about anything and everything available over these platforms. Users usually use Roman script to pen their sentiments in their language in addition to using English words due to technological limitations of using their nat...
The increase in domain specific text processing applications are demanding tools and techniques for domain specific Text Classification (TC) which may be helpful in many downstream applications like Machine Translation, Summarization, Question Answering etc. Further , many TC algorithms are applied on globally recognized languages like English givi...
Part-of-speech (POS) tagging is considered as one of the basic but necessary tools which are required for many Natural Language Processing (NLP) applications such as word sense disambiguation, information retrieval, information processing, parsing, question answering, and machine translation. Performance of the current POS taggers in Amharic is not...
Many automatic translation works have been addressed between major European language pairs, by taking advantage of large scale parallel corpora, but very few research works are conducted on the Amharic-Arabic language pair due to its parallel data scarcity. Two Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) based Neural Machine Trans...
In this paper, we describe the system submitted by our team for Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) shared task held at FIRE 2019. Hate speech and offensive language detection have become an important task due to the overwhelming usage of social media platforms in our daily life. This task has been ap...
In recent years, Deep Learning (DL) models are becoming important due to their demonstrated success at overcoming complex learning problems. DL models have been applied effectively for different Natural Language Processing (NLP) tasks such as part-of-Speech (PoS) tagging and Machine Translation (MT). Disease Named Entity Recognition (Disease-NER) i...
Biomedical Named Entity Recognition (BioNER) is a crucial step for analyzing Biomedical texts, which aims at extracting biomedical named entities from a given text. Different supervised machine learning algorithms have been applied for BioNER by various researchers. The main requirement of these approaches is an annotated dataset used for learning...
In this paper, the systems submitted by Mangalore University team for Indian Native Language Identification (INLI) task have been described. Native Language Identification (NLI) has different applications such as social media analysis, authorship identification, second language acquisition and forensic investigation. We submitted three systems usin...
Biomedical Named Entity Recognition (BioNER) is a crucial step for analyzing Biomedical texts, which aims at extracting biomedical named entities from a given text. Different
supervised machine learning algorithms have been applied for BioNER by various researchers. The main requirement of these approaches is an annotated dataset used
for learning...
Analysis of gene expression data obtained from microarray experiments is helpful for various biological purposes such as identifying Differentially Expressed genes, disease classification, predicting survival rate of patients etc. However, data from microarray experiments come with less sample size and thus have limited statistical power for any an...
Clinical Named Entity Recognition (Clinical-NER), which aims at identifying and classifying clinical named entities into predefined categories, is a critical preprocessing task in health information systems. Different machine learning approaches have been used to extract and classify clinical named entities. Each approach has its own strength as we...
This paper describes the systems submitted by our team for Indian Native Language Identification (INLI) task held in conjunction with FIRE 2017. Native Language Identification (NLI) is an important task that has different applications in different areas such as social-media analysis, authorship identification, second language acquisition and forens...
Gene expression data from microarray experiments is widely used for large scale gene expression analysis which facilitates the investigation of fundamental biological processes at molecular level. Such an investigation may be helpful for various biological purposes including disease diagnosis and prognosis, biomarker detection, differentially expre...
Biomedical Named Entity Recognition (Bio-NER) is an important subtask of Biomedical Text Mining (BioTM), where the performance of further tasks, such as relation extraction, protein-protein interaction and hypothesis generation depend on the performance of Bio-NER. Bio-NER involves determining the biomedical named entities, such as DNA, RNA, cell t...
Gene expression data suffer from the curse of dimensionality due to the presence of several thousands of genes (features) but a small number of samples. This problem of large feature space is addressed by feature selection algorithms which aim at finding a comparatively small set of significant features by removing the redundant and irrelevant feat...
Microarray technology makes it possible to measure expression level of thousands of genes simultaneously in an efficient and inexpensive manner. However, due to various complexities in processing microarrays, expression information of various genes may be missing due to unreliable measurements. The occurrence of missing values in gene expression da...
This paper presents a system for retrieving information from a domain specific document collection made up of data rich unnatural language text documents. Instead of conventional keyword based retrieval, our system makes use of domain ontology to retrieve the information from a collection of documents. The system addresses the problem of representi...
Extracting information from unstructured, brief and short text composed of short phrases, incomplete sentences, unordered sequence of words and words in short form not falling into any regular syntax is a challenging task. This paper describes an approach to automatically extract information from data rich unstructured text documents based on a dom...