Sabrina Tiun

Sabrina Tiun
Universiti Kebangsaan Malaysia | ukm · Center for Artificial Intelligence Technology

PhD ,Universiti Sains Malaysia

About

85
Publications
29,235
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
634
Citations
Citations since 2016
59 Research Items
534 Citations
2016201720182019202020212022050100150
2016201720182019202020212022050100150
2016201720182019202020212022050100150
2016201720182019202020212022050100150
Additional affiliations
November 2011 - present
Universiti Kebangsaan Malaysia
Position
  • Professor (Associate)

Publications

Publications (85)
Article
Full-text available
COVID-19 (coronavirus disease 2019) is an ongoing global pandemic caused by severe acute respiratory syndrome coro-navirus 2. Recently, it has been demonstrated that the voice data of the respiratory system (i.e., speech, sneezing, coughing, and breathing) can be processed via machine learning (ML) algorithms to detect respiratory system diseases,...
Article
Other languages have influenced Arabic because of several factors, such as geographical nearness, trade communication, past Islamic conquests, science and technology, new devices, brand names, models, and fashion. As a result of these factors, foreign words are used in Arabic text and are known as Arabised words. Arabised words affect the Arabic na...
Article
Full-text available
Many works have employed Machine Learning (ML) techniques in the detection of Diabetic Retinopathy (DR), a disease that affects the human eye. However, the accuracy of most DR detection methods still need improvement. Gray Wolf Optimization-Extreme Learning Machine (GWO-ELM) is one of the most popular ML algorithms, and can be considered as an accu...
Article
Full-text available
Twitter is a popular social media platform in Malaysia that allows for 280-character microblogging. Almost everything that happens in a single day is tweeted by users. Because of the popularity of Twitter, most Malaysians use it daily, providing researchers and developers with a wealth of data on Malaysian users. This paper explains why and how thi...
Article
Full-text available
Automatic Emotion Speech Recognition (ESR) is considered as an active research field in the Human-Computer Interface (HCI). Typically, the ESR system is consisting of two main parts: Front-End (features extraction) and Back-End (classification). However, most previous ESR systems have been focused on the features extraction part only and ignored th...
Article
Full-text available
One of the most important phases in text processing is stemming, whose aim is to aggregate all variations in a word into one group to aid natural language processing. The morphological structure of the Arabic language is more challenging than that of the English language; thus, it requires superior stemming algorithms for Arabic stemmers to be effe...
Article
Full-text available
In multilabel classification, each sample can be allocated to multiple class labels at the same time. However, one of the prominent problems of multilabel classification is missing labels (incomplete labels) in multilabel text. The multilabel classification performance is reduced significantly with the presence of missing labels. In order to addres...
Article
Full-text available
Spoken language identification (LID) is the process of determining and classifying natural language from a given content and dataset. Data must be processed to extract useful features to perform LID. The mel-frequency cepstral coefficient (MFCC) is one of the most popular feature extraction techniques in LID. The MFCC features are generated to serv...
Conference Paper
Full-text available
The technique used for recognizing a language by utilizing pronounced speech is called spoken Language Identification (LID). This field has a high significance in the interaction between human and computer. Besides, it can be implemented in several applications such as call centers, speaker diarization in multilingual environments, and in translati...
Article
Full-text available
Superior stemming algorithms aid significantly in many natural language processing (NLP) applications such as information retrieval. Arabic light-based stemmer is one of the most important stemming algorithms. However, partially due to the highly inflected and complexity of Arabic language morphological structure, most of the existing Arabic light-...
Article
Simultaneous multiple labelling of documents, also known as multilabel text classification, will not perform optimally if the class is highly imbalanced. Class imbalanced entails skewness in the fundamental data for distribution that leads to more difficulty in classification. Random over-sampling and under-sampling are common approaches to solve t...
Article
Full-text available
Removal of stop words is essential in Natural Language Processing and text-related analysis. Existing works on Malay stop words are based on standard Malay and Quranic/Arabic translations into Malay. Thus, there is a lack of domain-specific stop word list, making it discordant for processing of Malay parliamentary discourse. In this paper, we propo...
Article
Full-text available
Existing text clustering methods utilize only one representation at a time (single view), whereas multiple views can represent documents. The multiview multirepresentation method enhances clustering quality. Moreover, existing clustering methods that utilize more than one representation at a time (multiview) use representation with the same nature....
Article
Full-text available
The coronavirus disease (COVID-19), is an ongoing global pandemic caused by severe acute respiratory syndrome. Chest Computed Tomography (CT) is an effective method for detecting lung illnesses, including COVID-19. However, the CT scan is expensive and time-consuming. Therefore, this work focus on detecting COVID-19 using chest X-ray images because...
Article
Full-text available
Aspect-based sentiment analysis (ABSA) has recently attracted increasing attention due to its extensive applications. Most of the existing ABSA methods been applied on small-sized labeled datasets. However, real datasets such as the Amazon and TripAdvisor contain a massive number of reviews. Thus, applying these methods on large-scale datasets may...
Article
Full-text available
In this study, we propose an alternative approach to analyzing a domain-specific time series corpus for detecting word evolution. The method trains a target corpus in time series into a temporal word embedding (TWE) model. The advantage of TWE is that one can see how the meaning of a word changes over time. We have chosen the TWEC approach to model...
Article
Full-text available
Malay social media text is a text written on social media networks like Twitter. Commonly, this text comprises non-standard words, filled with dialects, foreign languages, word abbreviations, grammatical neglect, spelling errors, and many more. It is well known that this type of text is difficult to process due to its high noise and distinct text s...
Article
Full-text available
The metaheuristic genetic algorithm (GA) is based on the natural selection process that falls under the umbrella category of evolutionary algorithms (EA). Genetic algorithms are typically utilized for generating high-quality solutions for search and optimization problems by depending on bio-oriented operators such as selection, crossover, and mutat...
Conference Paper
Full-text available
User reviews are important resources for many processes such as recommender systems and decision-making programs. Sentiment analysis is one of the processes that is very useful for extracting the valuable information from these reviews. Data preprocessing step is of importance in the sentiment analysis process, in which suitable preprocessing metho...
Article
Full-text available
The determination and classification of natural language based on specified content and data set involves a process known as spoken language identification (LID). To initiate the process, useful features of the given data need to be extracted first in a mature process where the standard LID features have been previously developed by employing the u...
Article
Full-text available
One of the needs in adopting a crowdsourcing approach in software requirement system (SRS) is to be able to perform text analytics to gain insight or knowledge from the crowd’s feedback. One of the expected text analytic tasks is to be able of analyzing the feedback automatically; such as, whether the feedback concerns about the functional requirem...
Article
Full-text available
Information retrieval is a difficult process due to the overabundance of information on the web. Nowadays, search result responds to user queries with too many results although only a few are relevant. Therefore, the existing clustering methods that fail in clustering snippets (short texts) of web documents due to the low frequencies of document te...
Chapter
In this paper, we present the process of training the word embedding (WE) model for a small, domain-specific Malay corpus. In this study, Hansard corpus of Malaysia Parliament for specific years was trained on the Word2vec model. However, a specific setting of the hyperparameters is required to obtain an accurate WE model because changing one of th...
Article
Full-text available
The exponential growth of medical information over the social network has poses several challenges issues. One of these issues is configuring the drug-interactions and other medical-related entities. Adverse Drug Reaction (ADR) is one of these entities that is crucial to be identified for contributing toward determining drug-interactions. The liter...
Article
Full-text available
Adverse Drug Reaction (ADR) extraction is the process of identifying drug implications mentioned in social posts. Handling medical text for the identification of ADR is vital to research in terms of configuring the side effect and other medical-related entities within any medical text. However, investigating the role of such effect in the context o...
Article
Full-text available
The determination and classification of a recognized spoken language based on certain contents and datasets is known as the process of language identification (LID). The common process in carrying out LID entails the mandatory processing of data which enables the extraction of the necessary features for the process. The extraction involves a mature...
Article
Full-text available
Word sense disambiguation (WSD) is the process of identifying an appropriate sense for an ambiguous word. With the complexity of human languages in which a single word could yield different meanings, WSD has been utilized by several domains of interests such as search engines and machine translations. The literature shows a vast number of technique...
Data
Comparison of results of hybrid PSO based on SensEval-2 and SensEval-3 corpora of each POSs. (TIF)
Data
Retrieve the SemCor Sentences line contents by Java library of JSemCor. (TIF)
Article
Full-text available
Processing the meaning of words in social media texts, such as tweets, is challenging in natural language processing. Malay tweets are no exception because they demonstrate distinct linguistic phenomena, such as the use of dialects from each state in Malaysia; borrowing foreign language terms in the context of Malay language; and using mixed langua...
Article
Full-text available
Processing the meaning of words in social media texts, such as tweets, is challenging in natural language processing. Malay tweets are no exception because they demonstrate distinct linguistic phenomena, such as the use of dialects from each state in Malaysia; borrowing foreign language terms in the context of Malay language; and using mixed langua...
Article
Full-text available
Word Sense Disambiguation (WSD) is the process of determining the exact sense of a particular word in accordance to the context in a computational manner. Such task plays an essential role in multiple fields of study such as Information Retrieval and Information Extraction. With the complexity of human language, WSD came up to solve the problem beh...
Article
Full-text available
Computer vision (CV) refers to the study of the computer simulation of human visual science. Major task of CV is to collect images (or video) so that they could be used for analysis, gathering information, and making decisions or judgements. CV has greatly progressed and developed in the past few decades. In recent years, deep learning (DL) approac...
Article
Full-text available
Currently, the high volume of international information exchange involves a wide range of localities. As each locality comes with its own distinctive dialect, the need for an effective means of language translation is becoming more and more apparent. Among the concerns of information professionals is the capacity of an interested party to access we...
Article
Full-text available
An Information Retrieval (IR) system aims to extract information based on a query made by a user on a particular subject from an extensive collection of text. IR is a process through which information is retrieved by submitting a query by a user in the form of keywords or to match words. In the Al-Quran, verses of the same or comparable topics are...
Article
Full-text available
Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed u...
Data
Provides the languages, youtube channel names, and the URLs for every single channel that we have used to collocate our dataset. (TXT)
Article
Full-text available
Stemming is referred to a procedure of reducing all words appearing in different morphological variants to a common form. As a matter of fact, it is considered as a functional way in various areas of information-retrieval work and computational linguistics. In this paper, we introduced the Vocabulary Based Stemmer (VBS) as the alternative solution...
Article
Full-text available
Automatic text categorization (ATC) has attracted the attention of the research community over the last decade as it frees organizations from the need of manually organized documents. The ensemble techniques, which combine the results of a number of individually trained base classifiers, always improve classification performance better than base cl...
Article
One of the challenges of natural language processing is social media text like tweets. Conversational text in contrast to genres that are highly edited (standard language) which traditional NLP tools have been developed for contains many syntactic patterns and non-standard lexical items. These are the outcomes of dialectal variation, diversity in t...
Article
Now a days, the use of short text has been increased dramatically in which many applications are being relied on short text such as mobile messaging, breaking news social media and queries. The key challenging behind the short text lies on the limitation of acquiring context information from such text. This limitation increases both sparsity and am...
Article
Full-text available
Information retrieval is the process of analysing typed query as well as to retrieve relevant document according to the user query. Several issues can significantly affect the effectiveness of information retrieval. One of the common issue is the ambiguity lies on the words where a single word could yield several meanings. The process of identifyin...
Article
With the dramatic expansion of information over the internet, users around the world express their opinion daily on the social network such as Facebook and Twitter. Large corporations nowadays invest on analyzing these opinions in order to assess their products or services by knowing the people feedback toward such business. The process of knowing...
Article
Full-text available
Named Entity Recognition (NER) is the field of recognizing nouns such as names of people, corporations, places and dates. The process of extracting NEs is mainly relying on supervised machine learning techniques. Hence, utilizing proper features have a significant impact on the performance of recognizing the entities. Several approaches have been p...
Article
Full-text available
Named Entity Recognition (NER) is the field of identifying proper nouns such as names of people, corporations, places and dates. Recently, extracting information form web pages has caught the researchers’ attentions regarding the valuable information that lies on such pages. The common valuable information is the NEs. However, web pages contain mor...
Article
In order to develop a complete and usable Text-to-Speech (TTS) system requires years of time, hours of human workloads and tons of knowledge from various field of subjects. However, with a simple and easy software tools to use and understand, the burden of developing a complete TTS for a specific language can be overcome. Thus, such existing softwa...
Article
Multi-label text classification has become progressively more important in recent years, where each document can be given multiple labels concurrently. Multi-label text classification is a main challenging task because of the large space of all potential label sets, which is exponential to the number of candidate labels. Among the disadvantages of...
Article
The task of assigning proper meaning to an ambiguous word in a particular context is termed word sense disambiguation (WSD). We propose a genetic algorithm, improved by local search techniques, to maximise the overall semantic similarity or relatedness of a given text. Local search is used because of the inefficiency of population-based algorithms...
Article
Cross-Language Plagiarism Detection (CLPD)is used to automatically identify and extract plagiarism among documents in different languages.The main challenge of cross-languageplagiarism detection is the difference of text languages, where the original source can be analysed and translated, and plagiarism can be detected automatically by comparing su...
Article
With the exponential growth of textual information available from the Internet, there has been an emergent need to find relevant, in-time and in-depth knowledge about business topic. The huge size of such data makes the process of retrieving and analyzing and use of the valuable information in such texts manually a very difficult task. In this pape...
Article
Previous research proved that a complete and usable Malay Text-to-Speech (TTS) system based on formant synthesis could be developed within a short period of time without in-depth knowledge in relevant fields. The speech produced however still been influenced by Indonesian and English pronunciation. This has led to this research that intended to imp...
Article
Word sense disambiguation (WSD) is the process of eliminating ambiguity that lies on some words by identifying the exact sense of a given word. In the natural languages, many words could yield multiple meaning based on the context. WSD aims to identify the most accurate sense for such cases. In particular, when translating one language to another,...
Article
Full-text available
Word Sense Disambiguation (WSD) is the task of determining which sense of an ambiguous word (word with multiple meanings) is chosen in a particular use of that word, by considering its context. A sentence is considered ambiguous if it contains ambiguous word(s). Practically, any sentence that has been classified as ambiguous usually has multiple in...
Article
The methods and background introduced in this article concern on the interpretation of the Quranic text in English translation using word sense disambiguation. Three measures of semantic similarity measures: Wu-palmer, Lin and Jiang-Conrath, and their combination were used to identify words sense on the English Quranic text, in which comparison and...