Mustafa Mat Deris

Mustafa Mat Deris
  • PhD
  • Adjunct Professor at Telkom University

About

286
Publications
147,606
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,117
Citations
Introduction
Currently we are working on incomplete information systems. We use rough set theory to improve the accuracy
Current institution
Telkom University
Current position
  • Adjunct Professor

Publications

Publications (286)
Article
Full-text available
Data classification and feature/attribute selection approaches play important role in enabling organizations to extract meaningful insights from vast and complex datasets where the accuracy and processing time are two parameters of interest to determine which approach is favourable or suitable for enormous data. Moreover, the presence of redundant,...
Article
Full-text available
The linear regression model is one of the most common and easiest algorithms used in machine learning for predictive analysis purposes. However, this model performs well under strict assumptions such as the number of observations, the linearity of variables, multicollinearity, homoskedasticity, reliability of measurement, and normality. Besides, th...
Article
Identifying buildings for safety purposes is critical to anticipate unforeseen scenarios during a disaster. Rapid Visual Screening (RVS) is one of the procedures that can be used to determine a building's hazardous structure. The growing number of buildings necessitates grouping to provide recommendations for improving the analysis or conducting a...
Article
Full-text available
Identifying building conditions for user safety is an urgent matter, especially in earthquake-prone areas. Clustering buildings according to their conditions in the categories of danger, vulnerable, normal, and safe is important information for residents and the government to take further action. This study introduces a new method, namely hybrid mu...
Chapter
Categorical data clustering is still an issue due to difficulties/complexities of measuring the similarity of data. Several approaches have been introduced and recently the centroid-based approaches were introduced to reduce the complexities of the similarity of categorical data. However, those techniques still produce high computational times. In...
Chapter
This paper proposes a new clustering technique for handling a categorical data called Parametric Soft set (PSS). It bases on statistical distribution namely multinomial multivariate function. The probability of the data category with binary value can be calculated by binomial distribution. Its generalization called multinomial distribution function...
Article
Full-text available
Every development activity is always related to human or community aspects. This can also lead to changes in the characteristics of the community. The community's increasing awareness and critical attitude need to be accommodated to avoid the emergence of social conflicts in the future. This research is to find out how the public perception about t...
Article
Full-text available
Significant business implications and effective handling of supply side exceptions require a successful Supplier Base Management (SBM). The process of clustering manages the number of suppliers by grouping them on the basis of similar characteristics that reduces the number of variables impacting the operations. Several existing categorical cluster...
Article
One of the interesting and meaningful information that is hiding in transactional database is indirect association rule. It corresponds to the property of high dependencies between two items that are rarely occurred together but indirectly emerged via another items. Since indirect association rule is nontrivial information, it can implicitly give a...
Article
The symmetry triangular fuzzy number has been developed to build fuzzy autoregressive models by using various approaches such as low-high data, integer number, measurement error, and standard deviation data. However, most of these approaches are not simulated and compared between ordinary least square and fuzzy optimization in parameter estimation....
Chapter
Full-text available
Rough sets approximations have been implemented to handle categorical attributes from various domain problems. However, few studies investigate the combination between dependency degree of rough sets and chi-square test in decision support of the categorical data analysis. In this paper, we are interested to decide the healthy status based on healt...
Chapter
Full-text available
Classification of genres is among the important tasks of musical knowledge discovery. It may affect the accuracy of finding results or reducing the processing time when looking for a certain musical genre in an internet context. While the genre classification scheme looks very promising for western genres, the genre of non-western still has no spac...
Chapter
Ontology is used as knowledge representation of a particular domain that consists of the concepts and the two relations, namely taxonomic relation and non-taxonomic relation. In ontology, both relations are needed to give more knowledge about the domain texts, especially the non-taxonomic components that used to describe more about that domain. Mos...
Article
Full-text available
In this paper, we investigate the possibility of extending Latin Square of Order 4 in Hybrid Cube Encryption Algorithm (HiSea) and Three-Dimensional Hybrid Cubes Encryption (3D-HiSea) algorithm using Latin square of order 8. The objective of this research is to investigate the security improvement of key provided in current algorithm using Four-Dim...
Article
Full-text available
Text classification is imperative in order to search for more accessible and appropriate information. It utilized in various fields, including marketing, security, biomedical, etc. Apart from its usefulness, the available classifiers are vulnerable to two major problems, namely long processing time and low accuracy. They can result from a large amo...
Book
This book provides an introduction to data science and offers a practical overview of the concepts and techniques that readers need to get the most out of their large-scale data mining projects and research studies. It discusses data-analytical thinking, which is essential to extract useful knowledge and obtain commercial value from the data. Also...
Article
Full-text available
In modern data centres of cloud computing contains virtualization system. In order to improve network stability, energy efficiency, and makespan proper virtualization need. The virtual machine is one of the examples of virtualizations. Cloud computing data centres consist of millions of virtual machine to manage load balancing. In this study check...
Article
Full-text available
A key scheduling algorithm is the mechanism that generates and schedules all session-keys for the encryption and decryption process. The key space of conventional key schedule algorithm using the 2D hybrid cubes is not enough to resist attacks and could easily be exploited. In this regard, this research proposed a new Key Schedule Algorithm based o...
Chapter
Dimension reduction approach is one of the main data reduction approaches in order to reduce the storage and processing time while maintaining the integrity of the original data. A wide range of dimension reduction approaches are based on classical approaches such as PCA and Bayer’s, and machine learning approaches such as clustering, and feature s...
Article
Nowadays, with the increasing availability of online text documents, it becomes an important task for an organization to automatically classify the document. In Text Classification (TC), Support Vector Machine is the commonly used machine-learning algorithm. Performance of SVM highly depends on parameter tuning using metaheuristic algorithm for tex...
Article
Full-text available
Hierarchical clustering groups similar entities on the basis of some similarity (or distance) association and results in a tree like structure, called dendrogram. Dendrograms represent clusters in a nested manner, where at each step an entity makes a new cluster or merges into an existing cluster. Hierarchical clustering has many applications, ther...
Article
Full-text available
Abstract: Since electronic documents are dramatically increasing therefore document classification becomes a very important task to organised information automatically. Text in the documents is a high dimensional data that create difficulties in classification task. So, the feature ranking techniques are necessary to reduce the dimensionality of th...
Article
Full-text available
The issue of data uncertainties are very important in categorical data clustering since the boundary between created clusters are very arguable. Therefore the algorithm called Maximum Attribute Relative (MAR) that is based on the attribute relative of soft-set theory was proposed previously. MAR exploiting the data uncertainties in multi-value info...
Conference Paper
In iris recognition system, the feature extraction is the most important stages where the iris unique feature is extracted. Several methods of achieving this have been proposed in order to extract the distinct features that are unique to each individual. The aim of this research is to proposed a new Legendre wavelet filter that will decompose the i...
Chapter
The growing size and complexity of the software system makes testing essential in software engineering. In particular, the effectiveness of generating test cases becomes a crucial task, where there is an increment of source codes and a rapid change of the requirements. Therefore, the selection of effective test cases becomes problematic, when the t...
Chapter
Ontology is a representation of knowledge with a pair of concepts and relationships within a particular domain. Most of extracting techniques for non-taxonomic relation only identifies concepts and relations in a complete sentence. However, this does not represent the domain completely since there are some sentences in a domain text that have a mis...
Book
These proceedings gather outstanding research papers presented at the Second International Conference on Data Engineering 2015 (DaEng-2015) and offer a consolidated overview of the latest developments in databases, information retrieval, data mining and knowledge management. The conference brought together researchers and practitioners from academi...
Article
In iris recognition system, the segmentation stage is one of the most important stages where the iris is located and then further segmented into outer and lower boundary of iris region. Several algorithms have been proposed in order to segment the outer and lower boundary of the iris region. The aim of this research is to identify the suitable thre...
Article
Full-text available
Cloud computing system has gained its popularity due to its ability to produce efficient data center storage management. Data center has to be available regardless of large amounts of data request. The performance of the system will be degraded if lack of data management in the data center. Data replication is one of the strategies to ensure the da...
Article
Full-text available
Cloud Computing is a new concept for pool of virtualized computer resources. There are many approaches available to improve the job scheduling and load balancing in cloud environment. However, this research focused on the Job scheduling in cloud computing environment at Virtual Machines level by considering their bandwidth and RAM size. Three (3) t...
Article
Algebraic functions are the primitives that strengthen the cryptographic algorithms to ensure confidentiality of data and information. There is need for continues development of new and improvement of existing primitives. Quasigroup String transformation is one of those primitives that have many applications in cryptographic algorithms, Hash functi...
Article
Algebraic functions are the primitives that strengthen the cryptographic algorithms to ensure confidentiality of data and information. There is need for continues development of new and improvement of existing primitives. Quasigroup String transformation is one of those primitives that have many applications in cryptographic algorithms, Hash functi...
Article
Full-text available
Jaccard index and rough set approaches have been frequently implemented in decision support systems with various domain applications. Both approaches are appropriate to be considered for categorical data analysis. This paper presents the applications of sets operations for flu diagnosis systems based on two different approaches, such as, Jaccard in...
Conference Paper
Rough set and regression approximations are useful in establishing decision support system for medical diagnostic applications. However, the data elimination strategy for unclassified elements or patients in the medical diagnostic applications remains as a serious issue to be explored, especially with the aim of achieving higher prediction accuracy...
Chapter
Most of the existing methods focus on extracting concepts and identifying the hierarchy of concepts. However, in order to provide the whole view of the domain, the non-taxonomic relationships between concepts are also needed. Most of extracting techniques for non-taxonomic relation only identify concepts and relations in a complete sentence. Howeve...
Article
Various models used in stock market forecasting presented have been classified according to the data preparation, forecasting methodology, performance evaluation, and performance measure. However, these models have not sufficiently discussed in data preparation to overcome randomness, as well as uncertainty and volatility of stock prices issues in...
Article
Full-text available
Customer care plays an important role in a company especially in Telecommunication Company. Churn is perceived as the behaviour of a customer to leave or to terminate a service. This behaviour causes the loss of profit to companies because acquiring new customer requires higher investment compared to necessary to consider an efficient classificatio...
Chapter
The statistical regression models have been frequently used to explain the causal relationship between exogenous factors and the cholesterol level of patients. While, the dominant criteria for each exogenous factor which give impact to the cholesterol level are not yet investigated by previous studies. In this paper, we are interested to introduce...
Chapter
Full-text available
Iris recognition system is one of the most predominant methods used for personal identification in the modern days. Low quality iris image such as low contrast and poor illumination presents a setback for iris recognition as the acceptance or rejection rates of verified user depend solely on the image quality. This paper presents a new method for i...
Conference Paper
Full-text available
Accurate customer churn prediction is vital in any business organization due to higher cost involved in getting new customers. In telecommunication businesses, companies have used various types of single classifiers to classify customer churn, but the classification accuracy is still relatively low. However, the classification accuracy can be impro...
Conference Paper
Data clustering on categorical data pose a difficult challenge since there are no-inherent distance measures between data values. One of the approaches that can be used is by introducing a series of clustering attributes in the categorical data. By this approach, Maximum Total Attribute Relative (MTAR) technique that is based on the attribute relat...
Conference Paper
Full-text available
Currently, optimization problems are some of the immediate concern in economics. Peoples’ need is fast diversifying, while resources remain limited. This phenomenon is called the Multi-Objective Optimization (MOO) problem. Current techniques are mostly grounded in redundancy, large size path, long processing time. At this point in time, economic pr...
Article
Full-text available
With the prominent needs for security and reliable mode of identification in biometric system. Iris recognition has become reliable method for personal identification nowadays. The system has been used for years in many commercial and government applications that allow access control in places such as office, laboratory, armoury, automated teller m...
Article
Full-text available
Cryptographic algorithms play an important role in information security where it ensures the security of data across the network or storage. The generation of Hybrid Cubes (HC) based on permutation and combination of integer numbers are utilized in the construction of encryption and decryption key in the non-binary block cipher. In this study, we e...
Article
Full-text available
Security is the major concern when the sensitive information is stored and transferred across the internet where the information is no longer protected by physical boundaries. Cryptography is an essential, effective and efficient component to ensure the secure communication between the different entities by transferring unintelligible information a...
Conference Paper
Many models have been implemented in the energy sectors, especially in the electricity load consumption ranging from the statistical to the artificial intelligence models. However, most of these models do not consider the factors of uncertainty, the randomness and the probability of the time series data into the forecasting model. These factors giv...
Article
Full-text available
The advent of check digit methods has aided the detection of errors which are caused by human operations when information is typed wrongly. For instance, errors do often occur when numbers or characters are typed wrongly into a database and this may lead to many unwanted outcomes. Classical check digit system are usually based on basic arithmetic o...
Article
Various binary similarity measures have been employed in clustering approaches to make homogeneous groups of similar entities in the data. These similarity measures are mostly based only on the presence or absence of features. Binary similarity measures have also been explored with different clustering approaches (e.g., agglomerative hierarchical c...
Article
The main intention of proposing an alternative technique is to ensure consistency is been upheld besides successfully reducing the file. Of all the reduction techniques available currently, only normal parameter reduction has managed to address the issue of consistency at optimal and suboptimal level. In this paper, we initiated another form of red...
Article
Full-text available
Sentiment analysis is the process to study of people opinion, emotion and way of considering a matter and take the decision into different categorize like positive, negative and neutral in data mining. The sentiment analysis is used to find out negation within the text using Natural Language Processing rules. Our aim is to detect negation affect on...
Conference Paper
A key derivation function is a function that generate one or more cryptographic keys from a private string together with some public information. The generated cryptographic key(s) must be indistinguishable from random binary strings of the same length. To date, there are designed of key derivation function proposals using cryptographic primitives...
Article
Full-text available
Daily large number of bug reports are received in large open and close source bug tracking systems. Dealing with these reports manually utilizes time and resources which leads to delaying the resolution of important bugs. As an important process in software maintenance, bug triaging process carefully analyze these bug reports to determine, for exam...
Article
Full-text available
Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Signi...
Conference Paper
This paper presents the applicability of soft set theory for discovering the preference relation in multi-valued information systems. The proposed approach is based on the notion of multi-soft sets. An inclusion of objects into value set of decision class in soft set theory is used to discover the relation between objects based on preference relati...
Conference Paper
Many statistical models have been implemented in the energy sectors, especially in the oil production and oil consumption. However, these models required some assumptions regarding the data size and the normality of data set. These assumptions give impact to the forecasting accuracy. In this paper, the fuzzy time series (FTS) model is suggested to...
Conference Paper
The evolution of technology in this era has contributed to a growing of abundant data. Data mining is a well-known computational process in discovering meaningful and useful information from large data repositories. There are various techniques in data mining that can be deal with this situation and one of them is association rule mining. Formal Co...
Conference Paper
Different binary similarity measures have been explored with different agglomerative hierarchical clustering approaches for software clustering, to make the software systems understandable and manageable. Similarity measures have strengths and weakness that results in improving and deteriorating clustering quality. Determine whether strengths of th...
Article
Full-text available
Hybrid Cube (HC) is generated from a combination and permutation of integers. All possible combinations of hybrid cube layers are the source for the generation of encryption and decryption keys in the non-binary block cipher. This study extends the hybrid cube encryption algorithm (HiSea) and analyzing their security issues by increasing the comple...
Article
Full-text available
Feature ranking techniques are used to improve the performance of classification in text labeling problems. Most of the feature selection techniques utilize document and term frequencies to rank term. In contrast to document frequency, term frequency support real values of the term. Recent feature ranking techniques use term frequencies with freque...
Book
This book provides a comprehensive introduction and practical look at the concepts and techniques readers need to get the most out of their data in real-world, large-scale data mining projects. It also guides readers through the data-analytic thinking necessary for extracting useful knowledge and business value from the data. The book is based on...
Article
Full-text available
Cloud computing as an emerging technology attracts different scholars and organizations to explore its benefits. It brings together huge data resources from different geographical locations. Cloud resources are shared to achieve coherence and economic computing resource utilization. To this end, efficient and effective resource management is needed...
Article
Fuzzy time series has been implemented for data prediction in the various sectors, such as education, finance-economic, energy, traffic accident, others. Moreover, many proposed models have been presented to improve the forecasting accuracy. However, the interval-length adjustment and the out-sample forecast procedure are still issues in fuzzy time...
Conference Paper
Cryptographic hash functions are used to protect the integrity of information. Hash functions are designed by using existing block ciphers as compression functions. This is due to challenges and difficulties that are encountered in constructing new hash functions from the scratch. However, the key generations for encryption process result to huge c...
Article
Full-text available
Ensemble methods or multiple classifiers which combine decisions from many base classifiers have been confirmed to outperform the classification performance of any single classifiers. Despite having the ability of producing the highest classification accuracy, ensemble methods have suffered significantly from their large volume of base classifiers....
Article
Full-text available
Query Optimization (QO) for DBMS is perhaps the most important application for searching and retrieving information in a shorter time. Information retrieval is a process of accessing data from relational database. The increase in database amount, number of tables, blocks in the database and the size of query make Query Optimization (QO) the centre...
Conference Paper
Full-text available
Cluster analysis automatically partitioned the data into a number of different meaningful groups or clusters using the clustering algorithms. Every clustering algorithm produces its own type of clusters. Therefore, the evaluation of clustering is very important to find the better clustering algorithm. There exist a number of evaluation measures whi...
Conference Paper
Sentiment analysis is the process to study of people opinion, emotion and way of considering a matter and take decision into different categorizes like positive, negative and neutral in data mining. The sentiment analysis is used to find out negation within the text using Natural Language Processing rules. Our aim is to detect negation affect on co...
Article
Full-text available
Mining agricultural data with artificial immune system (AIS) algorithms, particularly the clonal selection algorithm (CLONALG) and artificial immune recognition system (AIRS), form the bedrock of this paper. The fuzzy-rough feature selection (FRFS) and vaguely quantified rough set (VQRS) feature selection are coupled with CLONALG and AIRS for impro...
Article
Frequent Pattern Tree (FP-Tree) is a compact data structure of representing frequent itemsets. The construction of FP-Tree is very important prior to frequent patterns mining. However, there have been too limited efforts specifically focused on constructing FP-Tree data structure beyond from its original database. In typical FP-Tree construction, b...
Article
Ensemble methods have been introduced as a useful and effective solution to improve the performance of the classification. Despite having the ability of producing the highest classification accuracy, ensemble methods have suffered significantly from their large volume of base classifiers. Nevertheless, we could overcome this problem by pruning some...
Article
A rough set is a mathematical tool to handle imprecise and imperfect information. It has been increasing in popularity recently in Knowledge Discovery in Database (KDD) and Machine Learning application. Rough set is one of the techniques used in KDD data mining. Data mining is an approach to extract useful information from a massive database for bu...
Article
This paper presents a comparison among the different classifiers such as Naïve Bayes (NB), decision tree (J48), Sequential Minimal Optimization (SMO), Multi-Layer Perception (MLP), and Instance Based for K-Nearest neighbor (IBK) on water quality for datasets of Kinta River, Perak, Malaysia. Classification accuracy and confusion matrix were used in...
Chapter
In this paper we propose Incremental Sequential PAttern Discovery using Equivalence classes (IncSPADE) algorithm to mine the dynamic database without the requirement of re-scanning the database again. In order to evaluate this algorithm, we conducted the experiments against three different artificial datasets. The result shows that IncSPADE outperf...
Article
Cryptographic hash functions are used to protect the integrity of information. Hash functions are implemented in applications such as; Message Authentication Codes, pseudo random number generators and key derivation functions. Thus, this arguably suggests the need for continuous development of hash functions. Traditionally, hash functions are desig...
Conference Paper
The increase in database amount, number of tables, blocks in the database and the size of query make Multi Join Query Optimization (MJQO) garnered considerable attention in Database Management System research. Many applications often involve complex multiple queries which share a lot of common subexpressions (CSEs) to Identifying and exploiting the...
Conference Paper
Full-text available
This paper presents a comparison among the different classifiers such as Naïve Bayes (NB), decision tree (J48), Sequential Minimal Optimization (SMO), Multi-Layer Perception (MLP), and Instance Based for K-Nearest neighbor (IBK) on water quality for datasets of Kinta River, Perak, Malaysia. Classification accuracy and confusion matrix were used in...
Article
Full-text available
In electrical power management, load forecasting accuracy is an indispensable factor which influences the decision making and planning of power companies in the future. Previous research has explored various forecasting models to resolve this issue, ranging from linear and non-linear regression to artificial intelligence algorithm. However, the abs...
Conference Paper
Distributed systems primarily provide the access to data intensive computation through a wide range of interfaces. Due to the advances of the systems, the scales and complexity of the system have increased, causing faults are likely bound to happen leading into diverse faults and failure conditions. Therefore, fault tolerance has become a crucial p...
Conference Paper
Full-text available
Content-based filtering is one of the most preferred methods to combat Short Message Service (SMS) spam. Memory usage and classification time are essential in SMS spam filtering, especially when working with limited resources. Therefore, suitable feature selection metric and proper filtering technique should be used. In this paper, we investigate h...
Article
Full-text available
As customers actively exercise their right to change to a better service and since engaging new customers is more costly compared to retaining loyal customers, customer churn has become the main focus for one organization. This phenomenon affects many industries such as telecommunication companies which need to provide excellent service in order to...
Conference Paper
Full-text available
This paper presents a comparison of different ensemble classifiers based on 10-fold cross validation on water quality for datasets of Kinta River, Perak, Malaysia. In a single classifier method, the MLP and IBk classifier performed well for Kinta River datasets. These classifiers provided the highest accuracy on the datasets. In this experiment, th...

Network

Cited By