Article

A Novel Cluster Detection of COVID-19 Patients and Medical Disease Conditions Using Improved Evolutionary Clustering Algorithm Star

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

With the increasing number of samples, the manual clustering of COVID-19 and medical disease data samples becomes time-consuming and requires highly skilled labour. Recently, several algorithms have been used for clustering medical datasets deterministically; however, these definitions have not been effective in grouping and analysing medical diseases. The use of evolutionary clustering algorithms may help to effectively cluster these diseases. On this presumption, we improved the current evolutionary clustering algorithm star (ECA*), called iECA*, in three manners: (i) utilising the elbow method to find the correct number of clusters; (ii) cleaning and processing data as part of iECA* to apply it to multivariate and domain-theory datasets; (iii) using iECA* for real-world applications in clustering COVID-19 and medical disease datasets. Experiments were conducted to examine the performance of iECA* against state-of-the-art algorithms using performance and validation measures (validation measures, statistical benchmarking, and performance ranking framework). The results demonstrate three primary findings. First, iECA* was more effective than other algorithms in grouping the chosen medical disease datasets according to the cluster validation criteria. Second, iECA* exhibited the lower execution time and memory consumption for clustering all the datasets, compared to the current clustering methods analysed. Third, an operational framework was proposed to rate the effectiveness of iECA* against other algorithms in the datasets analysed, and the results indicated that iECA* exhibited the best performance in clustering all medical datasets. Further research is required on real-world multi-dimensional data containing complex knowledge fields for experimental verification of iECA* compared to evolutionary algorithms.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... A cluster is regarded as a complete event group, and each category is regarded as an event in the complete event group with a probability of p i . When the probabilities of all categories are equal, that is, when the sample sizes in all categories are equal, the information entropy of this subcluster has the maximum value [16]. The probability is if p i = 1/k, the entropy of clustering is the largest, as shown in ...
... In Equation (16), T represents the time, n is the number, m is the group, and m′ is the corresponding group of the group. The average waiting time of the intelligent sensor can be obtained according to Equations (16) and (17). ...
Article
Full-text available
Since gateway meters cannot simultaneously realize online monitoring and remote verification, the study is aimed at exploring the intelligent online monitoring and remote verification of the gateway meter. First, the similarity and related evaluation indexes of embedded sensors are analyzed based on the relevant theories, like the theories of the embedded sensor and clustering algorithm. Second, the gateway meter is tested on the standard vibration test bed, and the accuracy, timeliness, and environmental adaptability of its intelligent online system are tested. Finally, the remote verification of the gateway meter is carried out from three aspects: the error/load value, secondary voltage drop, and the admittance test. The results show that when the intelligent online monitoring ability of the gateway meter is tested on the standard vibration test bed, the error of the same pilot is controlled between 0.1 mm after multiple peak tests, and the error is within the allowable range. In the intelligent online monitoring system based on the embedded sensor and clustering algorithm, the vibration acceleration is -0.6 cm/s~0.3 cm/s, the speed is -1 cm/s~1 cm/s, and the displacement fluctuates between -0.8 cm/s~0.8 cm/s. This shows that the intelligent online monitoring system can meet the performance requirements of online monitoring. In the process of remote verification of gateway meters, the active error and reactive error are 0.2% mm. The results of the secondary voltage drop and the admittance test show that the relevant technical indexes of the system meet the expected requirements. Therefore, the intelligent online monitoring and remote verification of gateway meters are discussed based on the embedded sensor and clustering algorithm, which provides a reference for the rapid development of gateway meters.
... They experimentally showed that ECA* outperformed its state-of-the-art counterparts [24]. They also demonstrated the efficiency of an improved version of this algorithm in detecting COVID-19 patients [25]. In another study, they used an adaptive version of ECA* to reduce the formal context to automate the process of deriving concept hierarchies from corpora. ...
Article
Full-text available
Data clustering is a method of dividing data points into similar groups. In the last few decades, there have been numerous advancements in algorithms and methodologies for the analysis of clustered data. Although some algorithms like K-means, due to their easy implementation and interpretability, are widely used for clustering, their capability of providing good solutions might be sensitive to the number of dimensions and clusters. Due to the advantages of metaheuristic-based clustering approaches like strong searchability, they have been widely used to tackle the clustering problem. Based on this premise, we propose a new bio-inspired red deer clustering algorithm (RDCA). Red deer algorithm (RDA) is a new metaheuristic inspired by the behavior of the Scottish red deer during a breeding season. First, the data space is partitioned into several divisions, each of which is represented by a disjunctive normal form expression using CLIQUE algorithm. Then a search strategy employing a modified version of RDA is pursued. Experiments are carried out to assess the performance of RDCA in comparison to eight traditional methods using six real-world and three generated datasets in terms of cluster validity measures, namely compactness, separation, combined, Turi, Davies Bouldin, and normalized mutual information score. The proposed method can handle multiple objective functions simultaneously and shows good performance when the data dimension is large. The results indicate that in most scenarios, the proposed method outperforms its counterparts. It is also shown that the proposed algorithm is less sensitive to increasing the number of clusters and dimensions.
... We will use the two COVID-19 databases. The Elbow technique is utilised for determining the consistency of the optimal number of clusters [20]. When Yt is a member of Ck the Eq. ...
Article
This paper describes the clustering technique for provinces-territories in Morocco and countries of the world at risk of the COVID-19 epidemic. Based on this proposed method, we have used COVID-19 Moroccan dataset, on August 18, 2021, with the higher new death number. The COVID-19 dataset for countries is based from the Worldometer on November 25, 2021. In this study, we employed K-Means algorithm, Elbow - Silhouette Methods and statistics analysis using new ‘Confirmed – Death’ two-dimensional data for Moroccan prefectures - provinces and new ‘Confirmed-Death-Recovered’ three dimensional data for world countries. Our results show that, the clustering method generated 3 prefecture -provincial groups for Morocco, with similar types of ‘Confirmed – Death’ cases, and is able to group world countries into 4 clusters, with similar types of ‘Confirmed – Death – Recovered’ cases. Our study can be considered as a model for all countries, for analysis of COVID-19, and help political leaders and health authorities make the right decisions.
... K-Means, EM Algorithm and GMM are three commonly used clustering algorithms in machine learning. Several clustering methods have been developed with the objective to find the correct number of clusters [11], [12], [13], [14], [15]. In [16] the authors focus on utilizing Probabilistic Graphical Models for detecting COVID-19, resulting in excellent detection of the disease. ...
... Algorithms with red accuracy in the above table, were used as input for voting classifier. As suggested from it is name, it takes the prediction average from all algorithms and produces the final prediction for the model; not all clustered algorithms that have performed well, it is vital to demonstrate how an algorithm responds fast to solve various real-world problem [37]. According to Table 9, RF was the best performed among others; it achieved 98.69% and was applied in this framework to predict diabetes. ...
Article
Full-text available
Today, diabetes is one of the most common chronic diseases in the world due to the people’s sedentary lifestyle which led to many health issues like heart attack, kidney frailer and blindness. Additionally, most of the people are unrealizable about the early-stage diabetes symptoms to prevent it. The above reasons were encouraging to develop a diabetes prediction system using machine learning techniques. The Pima Indian Diabetes Dataset (PIDD) was utilized for this framework as it is common and appropriate dataset in .CSV format. While there were not any duplicate or null values, however, some zero values were replaced, four outlier records were removed and data standardization were performed in the dataset. In addition, this project methodology divided into two phases of model selection. In the first phase, two different hyper parameter techniques (Randomized Search and TPOT(autoML)) were used to increase the accuracy level for each algorithm. Then six different algorithms (Logistic Regression, Decision Tree, Random Forest, K-nearest neighbor, Support Vector Machine and Naïve Bayes) were applied. In the second phase, the four best performed algorithms (with best estimated parameters for each of them) were chosen and used as an input for the voting classifier, because it applies to find the best algorithm between a group of multiple options. The result was satisfying, and Random Forest was achieved 98.69% in second stage, while its accuracy level was 81.04% in the previous one and it utilized to predict diabetes via a simple graphic user interface.
... This approach works on the assumption that similar users have similar tastes and preferences, and a target user likely provides positive ratings to the items that have already positively rated by his/her similar users. The same argument can also be made on the items [20,51,52] ; item-based CF algorithms assume that the similar items are likely to be rated similarity (positive or negative) by users. ...
Article
Full-text available
Recommendation Systems (RSs) have significant applications in many industrial systems. The duty of a recommender algorithm is to operate available data (users/items contextual data and rating (or purchase) the consumption history for items), as well as to provide a recommendation list for any target user. The recommended items should be selected so that the target user is compelled to give them positive reviews. In this manuscript, we propose a novel of RS algorithm that makes advantage of user-user trust relationships, rating histories, and their frequency of occurrence. We also provide a brand new overlapping community detection algorithm. The information about the users' community structure is used to handle the cold-start and sparsity problems. We compare the performance of the proposed RS algorithm with a number of state-of-the-art algorithms on the extended Epinions dataset, which has both information on trust relations and the timing of the ratings. Numerical simulations reveal the superiority of the proposed algorithm over others. We also investigate how the algorithms perform when only cold-start users and items are considered. As a cold-start user (item) we consider those that have made (received) less than five ratings. The experiments show significant outperformance of the proposed algorithm over others, which is mainly due to the use of information on overlapping community structures between users.
... Nevertheless, this automatic GA-based method's primary shortcoming is that large DCNNs cause GA to slow down as its chromosomes grow too large. Other popular research works in this field are improved evolutionary clustering algorithm star (Hassan et al., 2021), hybrid genetic algorithm and machine learning method (Zivkovic et al., 2021), and hybrid Gabor filter and convolutional deep learning classification method (Barshooi & Amirkhani, 2022). ...
Article
Applying Deep Learning (DL) in radiological images (i.e., chest X-Rays) is emerging because of the necessity of having accurate and fast COVID-19 detectors. Deep Convolutional Neural Networks (DCNN) have been typically used as robust COVID-19 positive case detectors in these approaches. Such DCCNs tend to utilize Gradient Descent-Based (GDB) algorithms as the last fully-connected layers’ trainers. Although GDB training algorithms have simple structures and fast convergence rates for cases with large training samples, they suffer from the manual tuning of numerous parameters, getting stuck in local minima, large training samples set requirements, and inherently sequential procedures. It is exceedingly challenging to parallelize them with Graphics Processing Units (GPU). Consequently, the Chimp Optimization Algorithm (ChOA) is presented for training the DCNN's fully connected layers in light of the scarcity of a big COVID-19 training dataset and for the purpose of developing a fast COVID-19 detector with the capability of parallel implementation. Following that, two publicly accessible datasets termed COVID-Xray-5k and COVIDetectioNet are used to benchmark the proposed detector known as DCCN-Chimp. In order to make a fair comparison, two structures are proposed: i-6c-2s-12c-2s and i-8c-2s-16c-2s, all of which have had their hyperparameters fine-tuned. The outcomes are evaluated in comparison to standard DCNN, Hybrid DCNN plus Genetic Algorithm (DCNN-GA), and Matched Subspace classifier with Adaptive Dictionaries (MSAD). Due to the large variation in results, we employ a weighted average of the ensemble of ten trained DCNN-ChOA, with the validation accuracy of the weights being used to determine the final weights. The validation accuracy for the mixed ensemble DCNN-ChOA is 99.11%. LeNet-5 DCNN's ensemble detection accuracy on COVID-19 is 84.58%. Comparatively, the suggested DCNN-ChOA yields over 99.11% accurate detection with a false alarm rate of less than 0.89 %. The outcomes show that the DCCN-Chimp can deliver noticeably superior results than the comparable detectors. The Class Activation Map (CAM) is another tool used in this study to identify probable COVID-19-infected areas. Results show that highlighted regions are completely connected with clinical outcomes, which has been verified by experts.
... In this paper, we are focused on the second one, which is useful for finding hidden patterns among unlabeled data based on distance criteria, while supervised learning is employed for classification and regression modeling [16]. Recently, unsupervised ML has been used for studying COVID-19 disease in order to find hidden patterns and information about most relevant factors and comorbidities related to a most severe form of the disease [17][18][19]. Supervised ML and classification algorithms have been implemented to predict COVID-19 prognosis and outcomes basing on hematochemical parameters [20]. Nevertheless, no one combined the usage of clustering methods with blood tests for analyzing convalescent COVID-19 patients. ...
... So, clustering belongs to unsupervised learning in machine learning (Sinaga and Yang, 2020). It is widely used in customer classification (Deng and Gao, 2020;Li et al., 2021a;Sun et al., 2021), automatic medical image detection (Hassan et al., 2021), image retrieval (Karthikeyan and Aruna, 2013;Gu et al., 2019;Anju and Shreelekshmi, 2022), object recognition (Woźniak and Połap, 2018), data mining (Hosseini et al., 2010), (Sato et al., 2019), pattern recognition (Xu et al., 2022), (Singh and Ganie, 2021), and other fields. ...
Article
Full-text available
Clustering is an unsupervised learning technique widely used in the field of data mining and analysis. Clustering encompasses many specific methods, among which the K-means algorithm maintains the predominance of popularity with respect to its simplicity and efficiency. However, its efficiency is significantly influenced by the initial solution and it is susceptible to being stuck in a local optimum. To eliminate these deficiencies of K-means, this paper proposes a quantum-inspired moth-flame optimizer with an enhanced local search strategy (QLSMFO). Firstly, quantum double-chain encoding and quantum revolving gates are introduced in the initial phase of the algorithm, which can enrich the population diversity and efficiently improve the exploration ability. Second, an improved local search strategy on the basis of the Shuffled Frog Leaping Algorithm (SFLA) is implemented to boost the exploitation capability of the standard MFO. Finally, the poor solutions are updated using Levy flight to obtain a faster convergence rate. Ten well-known UCI benchmark test datasets dedicated to clustering are selected for testing the efficiency of QLSMFO algorithms and compared with the K-means and ten currently popular swarm intelligence algorithms. Meanwhile, the Wilcoxon rank-sum test and Friedman test are utilized to evaluate the effect of QLSMFO. The simulation experimental results demonstrate that QLSMFO significantly outperforms other algorithms with respect to precision, convergence speed, and stability.
... Iterate repeatedly until the maximum number of queries. Based on the FFQS algorithm, Hassan et al. [8] proposed the Min-Max algorithm. Compared to this, the algorithm improves the consolidation stage. ...
Article
Full-text available
With the expansion of college enrollment, college graduates have continued to expand, and the employment situation has become more and more severe. As a new form of employment, innovation and entrepreneurship are becoming more and more important in college teaching. Entrepreneurial success is crucial. This paper proposes an entropy-based active learning method (ALPCS), which is divided into three stages: selection, exploration, and consolidation. The main contents are as follows: in the selection stage, the fuzzy c -means algorithm is used to obtain the membership of all samples, then calculate their Shannon entropy, and finally, select the sample with large Shannon entropy to generate an information subset (the larger the Shannon entropy, the greater the uncertainty, and the more information it contains). The distance-first strategy actively selects samples from the information subset to construct a cluster skeleton cluster. If it is equal to the real number of clusters, it enters the consolidation phase; otherwise, the active learning method stops. In the consolidation phase, sequentially from the information, the nonskeleton set points with the largest uncertainty are selected in the subset to form queries with the points in the skeleton set until the must-link constraint is formed. In this stage, the principle of minimum symmetric relative entropy first is used to reduce the number of queries. The ALPCS algorithm is compared and evaluated, and the final experimental results show that the ALPCS algorithm has a good performance when the number of queries is large.
... For additional research in the future, HS can be hybridized with several algorithms for healthcare problems to further validate its efficiency, such as the backtracking search optimization algorithm [94][95][96], the variants of evolutionary clustering algorithm star [97][98][99][100], chaotic sine cosine firefly algorithm [101], shuffled frog leaping algorithm [102] and hybrid artificial intelligence algorithms [103]. Furthermore, HS can be applied to more complex and real-world applications to explore more deeply the advantages and drawbacks of the algorithm or improve its efficiencies, such as engineering application problems [101], wind speed prediction [104][105][106][107][108][109][110], traffic flow prediction [111], laboratory management [112], e-organization and e-government services [113], online analytical processing [114], web science [115], and the Semantic Web ontology learning [116]. ...
Preprint
Full-text available
One of the popular metaheuristic search algorithms is Harmony Search (HS). It has been verified that HS can find solutions to optimization problems due to its balanced exploratory and convergence behavior and its simple and flexible structure. This capability makes the algorithm preferable to be applied in several real-world applications in various fields, including healthcare systems, different engineering fields, and computer science. The popularity of HS urges us to provide a comprehensive survey of the literature on HS and its variants on health systems, analyze its strengths and weaknesses, and suggest future research directions. In this review paper, the current studies and uses of harmony search are studied in four main domains. (i) The variants of HS, including its modifications and hybridization. (ii) Summary of the previous review works. (iii) Applications of HS in healthcare systems. (iv) And finally, an operational framework is proposed for the applications of HS in healthcare systems. The main contribution of this review is intended to provide a thorough examination of HS in healthcare systems while also serving as a valuable resource for prospective scholars who want to investigate or implement this method.
... For additional research in the future, HS can be hybridized with several algorithms for healthcare problems to further validate its efficiency, such as the backtracking search optimization algorithm [94][95][96], the variants of evolutionary clustering algorithm star [97][98][99][100], chaotic sine cosine firefly algorithm [101], shuffled frog leaping algorithm [102] and hybrid artificial intelligence algorithms [103]. Furthermore, HS can be applied to more complex and real-world applications to explore more deeply the advantages and drawbacks of the algorithm or improve its efficiencies, such as engineering application problems [101], wind speed prediction [104][105][106][107][108][109][110], traffic flow prediction [111], laboratory management [112], e-organization and e-government services [113], online analytical processing [114], web science [115], and the Semantic Web ontology learning [116]. ...
Article
One of the popular metaheuristic search algorithms is Harmony Search (HS). It has been verified that HS can find solutions to optimization problems due to its balanced exploratory and convergence behavior and its simple and flexible structure. This capability makes the algorithm preferable to be applied in several real-world applications in various fields, including healthcare systems, different engineering fields, and computer science. The popularity of HS urges us to provide a comprehensive survey of the literature on HS and its variants on health systems, analyze its strengths and weaknesses, and suggest future research directions. In this review paper, the current studies and uses of harmony search are studied in four main domains. (i) The variants of HS, including its modifications and hybridization. (ii) Summary of the previous review works. (iii) Applications of HS in healthcare systems. (iv) And finally, an operational framework is proposed for the applications of HS in healthcare systems. The main contribution of this review is intended to provide a thorough examination of HS in healthcare systems while also serving as a valuable resource for prospective scholars who want to investigate or implement this method.
... up with new methods that can be more effective and capable of yielding more satisfying results. For future reading, the authors advise the reader could optionally read the following research works (Bryar A.Hassan, 2020;B. A. Hassan, 2020B. A. Hassan, , 2021B. A. Hassan, Ahmed, Saeed, and Saeed, 2016;B. A. Hassan and Qader, 2021;Rashid, 2019, 2021a;B. A. Hassan, Rashid, and Hamarashid, 2021;B. A. Hassan, Rashid, and Mirjalili, 2021;B. Hassan and Dasmahapatra, n.d.;Maaroof et al., 2022;Saeed, Hassan, and Qader, 2017)(B. A. Hassan and Rashid, 2021b) ...
Article
Full-text available
The competitive advantage of aspect oriented programming (AOP) is that it improves the maintainability and understandability of software systems by modularizing crosscutting concerns. However, some concerns, such as logging or debugging, may be overlooked and should be entangled and distributed across the code base. AOP is a software development paradigm that enables developers to capture crosscutting concerns in split-aspect modes. Additionally, it is a novel notion that has the potential to improve the quality of software programs by removing the complexity involved with the production of code tangles via the usage of separation of concerns. As a result, it provides more modularity. Throughout its early development, some believed that AOP was easier to build and maintain than other implementations since it was based on an existing one. The statements are predicated on the premise that local improvements are easier to implement. Additionally, without appropriate visualization tools for both static and dynamic structures, cross-cutting challenges may be difficult for developers and researchers to appreciate. In recent years, AspectJ has begun to enable the depiction of crosscutting concerns via the release of IDE plugins. This article explains aspect oriented programming and how it may be used to improve the readability and maintainability of software projects. Additionally, it will evaluate the challenges it presents to application developers and academics.
... Besides, the modifications, as well as hybridizations of the algorithm, have been a hot spot for scholars. But still, there is room for others to work in improving our introduced algorithm as it can be hybridized with most recent swarm algorithms, such as backtracking search optimization algorithm [44,45], the variants of evolutionary clustering algorithm star [46][47][48][49], chaotic sine cosine firefly algorithm [50], and hybrid artificial intelligence algorithms [51]. Furthermore, DCNN-G-HHO can be applied to more complex and real-world applications to explore more deeply the advantages and drawbacks of the algorithm or improve its efficiencies, such as engineering application problems [50], laboratory management [52], e-organization and egovernment services [53], online analytical processing [54], web science [55], the Semantic Web ontology learning [56], cloud computing paradigms [57][58][59] [60], and evolutionary machine learning techniques [49,61,62]. ...
Preprint
Full-text available
Automated brain tumor detection is becoming a highly considerable medical diagnosis research. In recent medical diagnoses, detection and classification are highly considered to employ machine learning and deep learning techniques. Nevertheless, the accuracy and performance of current models need to be improved for suitable treatments. In this paper, an improvement in deep convolutional learning is ensured by adopting enhanced optimization algorithms, Thus, Deep Convolutional Neural Network (DCNN) based on improved Harris Hawks Optimization (HHO), called G-HHO has been considered. This hybridization features Grey Wolf Optimization (GWO) and HHO to give better results, limiting the convergence rate and enhancing performance. Moreover, Otsu thresholding is adopted to segment the tumor portion that emphasizes brain tumor detection. Experimental studies are conducted to validate the performance of the suggested method on a total number of 2073 augmented MRI images. The technique's performance was ensured by comparing it with the nine existing algorithms on huge augmented MRI images in terms of accuracy, precision, recall, f-measure, execution time, and memory usage. The performance comparison shows that the DCNN-G-HHO is much more successful than existing methods, especially on a scoring accuracy of 97%. Additionally, the statistical performance analysis indicates that the suggested approach is faster and utilizes less memory at identifying and categorizing brain tumor cancers on the MR images. The implementation of this validation is conducted on the Python platform. The relevant codes for the proposed approach are available at: https://github.com/bryarahassan/DCNN-G-HHO.
... Besides, the modifications, as well as hybridizations of the algorithm, have been a hot spot for scholars. But still, there is room for others to work in improving our introduced algorithm as it can be hybridized with most recent swarm algorithms, such as backtracking search optimization algorithm [19,20], the variants of evolutionary clustering algorithm star [21,[24][25][26], chaotic sine cosine firefly algorithm [13], and hybrid artificial intelligence algorithms [22]. Furthermore, DCNN-G-HHO can be applied to more complex and real-world applications to explore more deeply the advantages and drawbacks of the algorithm or improve its efficiencies, such as engineering application problems [13], laboratory management [49], eorganization and e-government services [23], online analytical processing [16], web science [14], the Semantic Web ontology learning [17], cloud computing paradigms [34][35][36][37], and evolutionary machine learning techniques [18,26,46]. ...
Article
Full-text available
Automated brain tumor detection is becoming a highly considerable medical diagnosis research. In recent medical diagnoses, detection and classification are highly considered to employ machine learning and deep learning techniques. Nevertheless, the accuracy and performance of current models need to be improved for suitable treatments. In this paper, an improvement in deep convolutional learning is ensured by adopting enhanced optimization algorithms, Thus, Deep Convolutional Neural Network (DCNN) based on improved Harris Hawks Optimization (HHO), called G-HHO has been considered. This hybridization features Grey Wolf Optimization (GWO) and HHO to give better results, limiting the convergence rate and enhancing performance. Moreover, Otsu thresholding is adopted to segment the tumor portion that emphasizes brain tumor detection. Experimental studies are conducted to validate the performance of the suggested method on a total number of 2073 augmented MRI images. The technique’s performance was ensured by comparing it with the nine existing algorithms on huge augmented MRI images in terms of accuracy, precision, recall, f-measure, execution time, and memory usage. The performance comparison shows that the DCNN-G-HHO is much more successful than existing methods, especially on a scoring accuracy of 97%. Additionally, the statistical performance analysis indicates that the suggested approach is faster and utilizes less memory at identifying and categorizing brain tumor cancers on the MR images. The implementation of this validation is conducted on the Python platform. The relevant codes for the proposed approach are available at: https://github.com/bryarahassan/DCNN-G-HHO.
... • Because of its simplicity in implementation and the fewer parameters, many works have applied the SFLA on different applications and besides, the modifications, as well as hybridizations of the algorithm have been a hot spot for the scholars. But still, there is room for others to work in improving the algorithm as it can be hybridized with most recent swarm algorithms, such as backtracking search optimization algorithm [12,13], the variants of evolutionary clustering algorithm star [72][73][74][75], chaotic sine cosine firefly algorithm [76], and hybrid artificial intelligence algorithms [77]. Furthermore, HS can be applied to more complex and real-world applications to explore more deeply the advantages and drawbacks of the algorithm or improve its efficiencies, such as engineering application problems [76], laboratory management [78], e-organization and e-government services [79], online analytical processing [80], web science [81], the Semantic Web ontology learning [82], chronic wound image processing [83], signal detection processing [84], and concept drift detection in big social data [85]. ...
... • Because of its simplicity in implementation and the fewer parameters, many works have applied the SFLA on different applications and besides, the modifications, as well as hybridizations of the algorithm have been a hot spot for the scholars. But still, there is room for others to work in improving the algorithm as it can be hybridized with most recent swarm algorithms, such as backtracking search optimization algorithm [12,13], the variants of evolutionary clustering algorithm star [72][73][74][75], chaotic sine cosine firefly algorithm [76], and hybrid artificial intelligence algorithms [77]. Furthermore, HS can be applied to more complex and real-world applications to explore more deeply the advantages and drawbacks of the algorithm or improve its efficiencies, such as engineering application problems [76], laboratory management [78], e-organization and e-government services [79], online analytical processing [80], web science [81], the Semantic Web ontology learning [82], chronic wound image processing [83], signal detection processing [84], and concept drift detection in big social data [85]. ...
Preprint
Full-text available
Shuffled Frog Leaping Algorithm (SFLA) is one of the most widespread algorithms. It was developed by Eusuff and Lansey in 2006. SFLA is a population-based metaheuristic algorithm that combines the benefits of memetics with particle swarm optimization. It has been used in various areas, especially in engineering problems due to its implementation easiness and limited variables. Many improvements have been made to the algorithm to alleviate its drawbacks, whether they were achieved through modifications or hybridizations with other well-known algorithms. This paper reviews the most relevant works on this algorithm. An overview of the SFLA is first conducted, followed by the algorithm's most recent modifications and hybridizations. Next, recent applications of the algorithm are discussed. Then, an operational framework of SLFA and its variants is proposed to analyze their uses on different cohorts of applications. Finally, future improvements to the algorithm are suggested. The main incentive to conduct this survey to provide useful information about the SFLA to researchers interested in working on the algorithm's enhancement or application
... This means that the changes and developments in it are continuing to come up with new methods that can be more effective and capable of yielding more satisfying results. For future works, the authors advise the readers to hybridize and evaluate the current proposed algorithm with the following research works: [47][48][49][50][51][52][53][54][55][56][57][58][59][60][61][62][63]. ...
Article
Full-text available
COVID-19, one of the most dangerous pandemics, is currently affecting humanity. COVID-19 is spreading rapidly due to its high reliability transmissibility. Patients who test positive more often have mild to severe symptoms such as a cough, fever, raw throat, and muscle aches. Diseased people experience severe symptoms in more severe cases. such as shortness of breath, which can lead to respiratory failure and death. Machine learning techniques for detection and classification are commonly used in current medical diagnoses. However, for treatment using neural networks based on improved Particle Swarm Optimization (PSO), known as PSONN, the accuracy and performance of current models must be improved. This hybridization implements Particle Swarm Optimization and a neural network to improve results while slowing convergence and improving efficiency. The purpose of this study is to contribute to resolving this issue by presenting the implementation and assessment of Machine Learning models. Using Neural Networks and Particle Swarm Optimization to help in the detection of COVID-19 in its early stages. To begin, we preprocessed data from a Brazilian dataset consisted primarily of early-stage symptoms. Following that, we implemented Neural Network and Particle Swarm Optimization algorithms. We used precision, accuracy score, recall, and F-Measure tests to evaluate the Neural Network with Particle Swarm Optimization algorithms. Based on the comparison, this paper grouped the top seven ML models such as Neural Networks, Logistic Regression, Nave Bayes Classifier, Multilayer Perceptron, Support Vector Machine, BF Tree, Bayesian Networks algorithms and measured feature importance, and other, to justify the differences between classification models. Particle Swarm Optimization with Neural Network is being deployed to improve the efficiency of the detection method by more accurately predicting COVID-19 detection. Preprocessed datasets with important features are then fed into the testing and training phases as inputs. Particle Swarm Optimization was used for the training phase of a neural net to identify the best weights and biases. On training data, the highest rate of accuracy gained is 0.98.738 and on testing data, it is 98.689.
... Because of its simplicity in implementation and the fewer parameters, many works Civil engineering Fig. 6 The proposed operational framework of SFLA have applied the SFLA on different applications and besides, the modifications, as well as hybridizations of the algorithm have been a hot spot for the scholars. But still, there is room for others to work in improving the algorithm as it can be hybridized with most recent swarm algorithms, such as backtracking search optimization algorithm [12,13], the variants of evolutionary clustering algorithm star [73][74][75][76], chaotic sine cosine firefly algorithm [77], and hybrid artificial intelligence algorithms [78]. Furthermore, HS can be applied to more complex and real-world applications to explore more deeply the advantages and drawbacks of the algorithm or improve its efficiencies, such as engineering application problems [77], laboratory management [79], e-organization and egovernment services [78], online analytical processing [79], web science [80], the Semantic Web ontology learning [81], chronic wound image processing [82], signal detection processing [83], and concept drift detection in big social data [84]. ...
Article
Full-text available
Shuffled Frog Leaping Algorithm (SFLA) is one of the most widespread algorithms. It was developed by Eusuff and Lansey in 2006. SFLA is a population-based metaheuristic algorithm that combines the benefits of memetics with particle swarm optimization. It has been used in various areas, especially in engineering problems due to its implementation easiness and limited variables. Many improvements have been made to the algorithm to alleviate its drawbacks, whether they were achieved through modifications or hybridizations with other well-known algorithms. This paper reviews the most relevant works on this algorithm. An overview of the SFLA is first conducted, followed by the algorithm's most recent modifications and hybridizations. Next, recent applications of the algorithm are discussed. Then, an operational framework of SLFA and its variants is proposed to analyze their uses on different cohorts of applications. Finally, future improvements to the algorithm are suggested. The main incentive to conduct this survey to provide useful information about the SFLA to researchers interested in working on the algorithm's enhancement or application.
Article
Background Increasingly, patient medication adherence data are being consolidated from claims databases and electronic health records (EHRs). Such databases offer an indirect avenue to gauge medication adherence in our data‐rich healthcare milieu. The surge in data accessibility, coupled with the pressing need for its conversion to actionable insights, has spotlighted data mining, with machine learning (ML) emerging as a pivotal technique. Nonadherence poses heightened health risks and escalates medical costs. This paper elucidates the synergistic interaction between medical database mining for medication adherence and the role of ML in fostering knowledge discovery. Methods We conducted a comprehensive review of EHR applications in the realm of medication adherence, leveraging ML techniques. We expounded on the evolution and structure of medical databases pertinent to medication adherence and harnessed both supervised and unsupervised ML paradigms to delve into adherence and its ramifications. Results Our study underscores the applications of medical databases and ML, encompassing both supervised and unsupervised learning, for medication adherence in clinical big data. Databases like SEER and NHANES, often underutilized due to their intricacies, have gained prominence. Employing ML to excavate patient medication logs from these databases facilitates adherence analysis. Such findings are pivotal for clinical decision‐making, risk stratification, and scholarly pursuits, aiming to elevate healthcare quality. Conclusion Advanced data mining in the era of big data has revolutionized medication adherence research, thereby enhancing patient care. Emphasizing bespoke interventions and research could herald transformative shifts in therapeutic modalities.
Article
Background and purpose The diagnosis of sleep–wake disorders (SWDs) is challenging because of the existence of only few accurate biomarkers and the frequent coexistence of multiple SWDs and/or other comorbidities. The aim of this study was to assess in a large cohort of well‐characterized SWD patients the potential of a data‐driven approach for the identification of SWDs. Methods We included 6958 patients from the Bernese Sleep Registry and 300 variables/biomarkers including questionnaires, results of polysomnography/vigilance tests, and final clinical diagnoses. A pipeline, based on machine learning, was created to extract and cluster the clinical data. Our analysis was performed on three cohorts: patients with central disorders of hypersomnolence (CDHs), a full cohort of patients with SWDs, and a clean cohort without coexisting SWDs. Results A first analysis focused on the cohort of patients with CDHs and revealed four patient clusters: two clusters for narcolepsy type 1 (NT1) but not for narcolepsy type 2 or idiopathic hypersomnia. In the full cohort of SWDs, nine clusters were found: four contained patients with obstructive and central sleep apnea syndrome, one with NT1, and four with intermixed SWDs. In the cohort of patients without coexisting SWDs, an additional cluster of patients with chronic insomnia disorder was identified. Conclusions This study confirms the existence of clear clusters of NT1 in CDHs, but mainly intermixed groups in the full spectrum of SWDs, with the exception of sleep apnea syndromes and NT1. New biomarkers are needed for better phenotyping and diagnosis of SWDs.
Article
Data clustering is a machine learning method for unsupervised learning that is popular in the two areas of data analysis and data mining. The objective is to partition a given dataset into distinct clusters, aiming to maximize the similarity among data objects within the same cluster. In this paper, an improved honey badger algorithm called DELHBA is proposed to solve the clustering problem. In DELHBA, to boost the population’s diversity and the performance of global search, the differential evolution method is incorporated into algorithm’s initial step. Secondly, the equilibrium pooling technique is included to assist the standard honey badger algorithm (HBA) break free of the local optimum. Finally, the updated honey badger population individuals are updated with Levy flight strategy to produce more potential solutions. Ten famous benchmark test datasets are utilized to evaluate the efficiency of the DELHBA algorithm and to contrast it with twelve of the current most used swarm intelligence algorithms and k-means. Additionally, DELHBA algorithm’s performance is assessed using the Wilcoxon rank sum test and Friedman’s test. The experimental results show that DELHBA has better clustering accuracy, convergence speed and stability compared with other algorithms, demonstrating its superiority in solving clustering problems.
Article
This research introduces an efficacious model for incremental data clustering using Entropy weighted-Gradient Namib Beetle Mayfly Algorithm (NBMA). Here, feature selection is done based upon support vector machine recursive feature elimination (SVM-RFE), where the weight parameter is optimally fine-tuned using NBMA. After that, clustering is carried out utilizing entropy weighted power k-means clustering algorithm and weight is updated employing designed Gradient NBMA. Finally, incremental data clustering takes place in which centroid matching is carried out based on RV coefficient, whereas centroid is updated based on deep maxout network (DMN). Also, the result shows the better performance of the proposed method..
Chapter
Clustering by fast search and find of density peaks is a new density-based clustering algorithm, which is widely used in various fields owing to its simplicity and efficiency, unique parameters, and recognition of arbitrary shape clusters. However, when selecting the cluster center requires human participation, which makes the clustering result to be subjectively affected by the operator, thus reducing the availability of clustering and interrupting the fluency of the algorithm. In this study, to eliminate artificial participation in the selection of cluster centers, a weighted decision measurement slope change method is proposed to select cluster centers, and the F-Measure, ARI, and AMI of the algorithm are tested in the UCI and synthetic datasets. Experimental results show that the proposed algorithm addresses the limitation of human participation in the selection of cluster centers and improves the clustering performance of the algorithm. KeywordsClustering algorithmClustering by fast search and find of density peaks (DPC)Cluster centersDecision metrics
Article
Full-text available
Machine Learning (ML) is a part of Artificial intelligence (AI) that designs and produces systems, which is capable of developing and learning from experiences automatically without making them programmable. ML concentrates on the computer program improvement, which has the ability to access and utilize data for learning from itself. There are different algorithms in ML field, but the most important questions that arise are: Which technique should be utilized on a dataset? and How to investigate ML algorithm? This paper presents the answer for the mentioned questions. Besides, investigation and checking algorithms for a data set will be addressed. In addition, it illustrates choosing the provided test options and metrics assessment. Finally, researchers will be able to conduct this research work on their datasets to select an appropriate model for their datasets.
Preprint
Full-text available
Clustering is a fundamental machine learning task which has been widely studied in the literature. Classic clustering methods follow the assumption that data are represented as features in a vectorized form through various representation learning techniques. As the data become increasingly complicated and complex, the shallow (traditional) clustering methods can no longer handle the high-dimensional data type. With the huge success of deep learning, especially the deep unsupervised learning, many representation learning techniques with deep architectures have been proposed in the past decade. Recently, the concept of Deep Clustering, i.e., jointly optimizing the representation learning and clustering, has been proposed and hence attracted growing attention in the community. Motivated by the tremendous success of deep learning in clustering, one of the most fundamental machine learning tasks, and the large number of recent advances in this direction, in this paper we conduct a comprehensive survey on deep clustering by proposing a new taxonomy of different state-of-the-art approaches. We summarize the essential components of deep clustering and categorize existing methods by the ways they design interactions between deep representation learning and clustering. Moreover, this survey also provides the popular benchmark datasets, evaluation metrics and open-source implementations to clearly illustrate various experimental settings. Last but not least, we discuss the practical applications of deep clustering and suggest challenging topics deserving further investigations as future directions.
Article
Residential spatial differentiation is a common phenomenon during the urbanization process of large cities. This paper studies residential differentiation in Hangzhou, China, by combining subdistrict-level (Jie Dao) data from a resident travel survey. After creating 29 demographic and accessibility variables for 74 sub-districts of Hangzhou, a mixed method approach is used to analyze the spatial differentiation of diverse social groups. This approach integrates factor analysis, cluster analysis, and spatial expression. The results show that people of different incomes and occupations show a differentiated choice of living space. High-income groups, middle-income professionals and technicians, and other members of the urban elite prefer to live in the central city. The tendency of suburbanization is not apparent, and new-built gentrification appears in certain emerging urban areas. In addition to accessibility of the central business district and employment concentration, accessibility of neighboring communities and accessibility of quality education resources have also become more prominent impacts on social-spatial differentiation.
Article
Early detection and prevention of Alzheimer’s disease (AD) is an important and challenging task. Determining a precise and accurate diagnosis of Alzheimer’s disease in its early stages is the most significant challenge. As a result, various research for the early detection of Alzheimer’s disease was conducted. However, these techniques have a number of drawbacks, including higher computational costs, failure to incorporate data from multiple modalities, performance degradation due to data distributions between training and testing data, inability to record brain affected regions, longer processing time, etc. To tackle these issues, we proposed Optimized VGG-16 architecture using Arithmetic Optimization Algorithm (Optimized VGG-16 using AOA) for AD classification. Three major components are involved in this study such as pre-processing, segmentation, and classification. The CAT12 toolkit is used to process the format of T1-weighted MRI images during pre-processing. The image enhancement techniques normalize the uneven light distribution in which the linear contrast stretching enhances the image contrast level. Finally, an Optimized VGG-16 using AOA effectively classifies the AD classes such as normal, mild dementia (severe cognitive decline), and late dementia (very severe cognitive decline) classes. The dataset images are chosen from Alzheimer’s disease Neuroimaging Initiative (ADNI), the Open Access Series of Imaging Studies (OASIS) dataset, and Single Individual volunteer for Multiple Observations across Networks (SIMON) databases. The experimental investigations provided superior classification performances than other existing methods.
Article
Full-text available
It is beneficial to automate the process of deriving concept hierarchies from corpora since a manual construction of concept hierarchies is typically a time-consuming and resource-intensive process. As such, the overall process of learning concept hierarchies from corpora encompasses a set of steps: parsing the text into sentences, splitting the sentences and then tokenising it. After the lemmatisation step, the pairs are extracted using formal context analysis (FCA). However, there might be some uninteresting and erroneous pairs in the formal context. Generating formal context may lead to a time-consuming process, so formal context size reduction is require to remove uninterested and erroneous pairs, taking less time to extract the concept lattice and concept hierarchies accordingly. In this premise, this study aims to propose two frameworks: (1) A framework to review the current process of deriving concept hierarchies from corpus utilising formal concept analysis (FCA); (2) A framework to decrease the formal context’s ambiguity of the first framework using an adaptive version of evolutionary clustering algorithm (ECA*). Experiments are conducted by applying 385 sample corpora from Wikipedia on the two frameworks to examine the reducing size of formal context, which leads to yield concept lattice and concept hierarchy. The resulting lattice of formal context is evaluated to the standard one using concept lattice-invariants. Accordingly, the homomorphic between the two lattices preserves the quality of resulting concept hierarchies by 89% in contrast to the basic ones, and the reduced concept lattice inherits the structural relation of the standard one. The adaptive ECA* is examined against its four counterpart baseline algorithms (Fuzzy K-means, JBOS approach, AddIntent algorithm, and FastAddExtent) to measure the execution time on random datasets with different densities (fill ratios). The results show that adaptive ECA* performs concept lattice faster than other mentioned competitive techniques in different fill ratios.
Article
Full-text available
This article presents the data used to evaluate the performance of evolutionary clustering algorithm star (ECA*) compared to five traditional and modern clustering algorithms. Two experimental methods are employed to examine the performance of ECA* against genetic algorithm for clustering++ (GENCLUST++), learning vector quantisation (LVQ), expectation maximisation (EM), K-means++ (KM++) and K-means (KM). These algorithms are applied to 32 heterogenous and multi-featured datasets to determine which one performs well on the three tests. For one, ther paper examines the efficiency of ECA* in contradiction of its corresponding algorithms using clustering evaluation measures. These validation criteria are objective function and cluster quality measures. For another, it suggests a performance rating framework to measurethe the performance sensitivity of these algorithms on varos dataset features (cluster dimensionality, number of clusters, cluster overlap, cluster shape and cluster structure). The contributions of these experiments are two-folds: (i) ECA* exceeds its counterpart aloriths in ability to find out the right cluster number; (ii) ECA* is less sensitive towards dataset features compared to its competitive techniques. Nonetheless, the results of the experiments performed demonstrate some limitations in the ECA*: (i) ECA* is not fully applied based on the premise that no prior knowledge exists; (ii) Adapting and utilising ECA* on several real applications has not been achieved yet.
Article
Full-text available
Clustering is a commonly used method for exploring and analysing data where the primary objective is to categorise observations into similar clusters. In recent decades, several algorithms and methods have been developed for analysing clustered data. We notice that most of these techniques deterministically define a cluster based on the value of the attributes, distance, and density of homogenous and single-featured datasets. However, these definitions are not successful in adding clear semantic meaning to the clusters produced. Evolutionary operators and statistical and multidisciplinary techniques may help in generating meaningful clusters. Based on this premise, we propose a new evolutionary clustering algorithm (ECA*) based on social class ranking and meta-heuristic algorithms for stochastically analysing heterogeneous and multifeatured datasets. The ECA* is integrated with recombinational evolutionary operators, Levy flight optimisation, and some statistical techniques, such as quartiles and percentiles, as well as the Euclidean distance of the K-means algorithm. Experiments are conducted to evaluate the ECA* against five conventional approaches: K-means (KM), K-means++ (KM++), expectation maximisation (EM), learning vector quantisation (LVQ), and the genetic algorithm for clustering++ (GENCLUST++). That the end, 32 heterogeneous and multifeatured datasets are used to examine their performance using internal and external and basic statistical performance clustering measures and to measure how their performance is sensitive to five features of these datasets (cluster overlap, the number of clusters, cluster dimensionality, the cluster structure, and the cluster shape) in the form of an operational framework. The results indicate that the ECA* surpasses its counterpart techniques in terms of the ability to find the right clusters. Significantly, compared to its counterpart techniques, the ECA* is less sensitive to the five properties of the datasets mentioned above. Thus, the order of overall performance of these algorithms, from best performing to worst performing, is the ECA*, EM, KM++, KM, LVQ, and the GENCLUST++. Meanwhile, the overall performance rank of the ECA* is 1.1 (where the rank of 1 represents the best performing algorithm and the rank of 6 refers to the worst performing algorithm) for 32 datasets based on the five dataset features mentioned above.
Article
Full-text available
Recently, numerous meta-heuristic-based approaches are deliberated to reduce the computational complexities of several existing approaches that include tricky derivations, very large memory space requirement, initial value sensitivity, etc. However, several optimization algorithms namely firefly algorithm, sine–cosine algorithm, and particle swarm optimization algorithm have few drawbacks such as computational complexity and convergence speed. So to overcome such shortcomings, this paper aims in developing a novel chaotic sine–cosine firefly (CSCF) algorithm with numerous variants to solve optimization problems. Here, the chaotic form of two algorithms namely the sine–cosine algorithm and the firefly algorithms is integrated to improve the convergence speed and efficiency thus minimizing several complexity issues. Moreover, the proposed CSCF approach is operated under various chaotic phases and the optimal chaotic variants containing the best chaotic mapping are selected. Then numerous chaotic benchmark functions are utilized to examine the system performance of the CSCF algorithm. Finally, the simulation results for the problems based on engineering design are demonstrated to prove the efficiency, robustness and effectiveness of the proposed algorithm.
Chapter
Full-text available
Human-level diagnostic performance from intelligent systems often depends on large set of training data. However, the amount of available data for model training may be limited for part of diseases, which would cause the widely adopted deep learning models not generalizing well. One alternative simple approach to small class prediction is the traditional k-nearest neighbor (kNN). However, due to the non-parametric characteristics of kNN, it is difficult to combine the kNN classification into the learning of feature extractor. This paper proposes an end-to-end learning strategy to unify the kNN classification and the feature extraction procedure. The basic idea is to enforce that each training sample and its K nearest neighbors belong to the same class during learning the feature extractor. Experiments on multiple small-class and class-imbalanced medical image datasets showed that the proposed deep kNN outperforms both kNN and other strong classifiers.
Article
Full-text available
Cluster analysis is an essential tool in data mining. Several clustering algorithms have been proposed and implemented, most of which can find good quality clustering results. However, the majority of the traditional clustering algorithms, such as the K-means, K-medoids, and Chameleon, still depend on being provided a priori with the number of clusters and may struggle to deal with problems where the number of clusters is unknown. This lack of vital information may impose some additional computational burdens or requirements on the relevant clustering algorithms. In real-world data clustering analysis problems, the number of clusters in data objects cannot easily be preidentified and so determining the optimal amount of clusters for a dataset of high density and dimensionality is quite a difficult task. Therefore, sophisticated automatic clustering techniques are indispensable because of their flexibility and effectiveness. This paper presents a systematic taxonomical overview and bibliometric analysis of the trends and progress in nature-inspired metaheuristic clustering approaches from the early attempts in the 1990s until today’s novel solutions. Finally, key issues with the formulation of metaheuristic algorithms as a clustering problem and major application areas are also covered in this paper.
Article
Full-text available
In this data article, we present the data used to evaluate the statistical success of the backtracking search optimisation algorithm (BSA) in comparison with the other four evolutionary optimisation algorithms. The data presented in this data article is related to the research article entitles ‘Operational Framework for Recent Advances in Backtracking Search Optimisation Algorithm: A Systematic Review and Performance Evaluation’ [1]. Three statistical tests conducted on BSA compared to differential evolution algorithm (DE), particle swarm optimisation (PSO), artificial bee colony (ABC), and firefly algorithm (FF). The tests are used to evaluate these mentioned algorithms and to determine which one could solve a specific optimisation problem concerning the statistical success of 16 benchmark problems taking several criteria into account. The criteria are initializing control parameters, dimension of the problems, their search space, and number of iterations needed to minimize a problem, the performance of the computer used to code the algorithms and their programming style, getting a balance on the effect of randomization, and the use of different type of optimisation problem in terms of hardness and their cohort. In addition, all the three tests include necessary statistical measures (Mean: mean-solution, S.D.: standard-deviation of mean-solution, Best: the best solution, Worst: the worst solution, Exec. Time: mean runtime in seconds, No. of succeeds: number of successful minimisation, and No. of Failure: number of failed minimisation).
Chapter
Full-text available
The “No Free Lunch” theorem states that, averaged over all optimization problems, without re-sampling, all optimization algorithms perform equally well. Optimization, search, and supervised learning are the areas that have benefited more from this important theoretical concept. Formulation of the initial No Free Lunch theorem, very soon, gave rise to a number of research works which resulted in a suite of theorems that define an entire research field with significant results in other scientific areas where successfully exploring a search space is an essential and critical task. The objective of this paper is to go through the main research efforts that contributed to this research field, reveal the main issues, and disclose those points that are helpful in understanding the hypotheses, the restrictions, or even the inability of applying No Free Lunch theorems.
Article
Full-text available
This paper has two contributions. First, we introduce a clustering basic benchmark. Second, we study the performance of k-means using this benchmark. Specifically, we measure how the performance depends on four factors: (1) overlap of clusters, (2) number of clusters, (3) dimensionality, and (4) unbalance of cluster sizes. The results show that overlap is critical, and that k-means starts to work effectively when the overlap reaches 4% level. © 2018, Springer Science+Business Media, LLC, part of Springer Nature.
Article
Full-text available
Knowledge discovery from data demands that it shall be the data themselves that reveal the groups (i.e. the data elements in each group) and the number of groups. For the ubiquitous task of clustering, K-MEANS is the most used algorithm applied in a broad range of areas to identify groups where intra-group distances are much smaller than inter-group distances. As a representative-based clustering approach, K-MEANS offers an extremely efficient gradient descent approach to the total squared error of representation; however, it not only demands the parameter k, but it also makes assumptions about the similarity of density among the clusters. Therefore, it is profoundly affected by noise. Perhaps more seriously, it can often be attracted to local optima despite its immersion in a multi-start scheme. We present an effective genetic algorithm that combines the capacity of genetic operators to conglomerate different solutions of the search space with the exploitation of the hill-climber. We advance a previous genetic-searching approach called GENCLUST, with the intervention of fast hill-climbing cycles of K-MEANS and obtain an algorithm that is faster than its predecessor and achieves clustering results of higher quality. We demonstrate this across a series of 18 commonly researched datasets.
Conference Paper
Full-text available
Nowadays, in most of the fields, task automation is area of interest and research due to that manual execution of a task is error prone, time consuming, involving more human resources and focus concerning. In the area of Computer laboratory administration, the old fashioned administration cannot run with today’s growth, where the Operating System (OS) and required applications are installed on all the machines one by one. Therefore, a framework for automating Lab administration in regards of Operating Systems and Application installations will be proposed in this research. Affordability, simplicity, usability are taken into major consideration. All the parts of the framework are implemented and illustrated in detail which promotes a great enhancement in the area of Computer Lab Administration.
Article
Full-text available
K-nearest neighbors (KNN) algorithm is a common algorithm used for classification, and also a sub-routine in various complicated machine learning tasks. In this paper, we presented a quantum algorithm (QKNN) for implementing this algorithm based on the metric of Hamming distance. We put forward a quantum circuit for computing Hamming distance between testing sample and each feature vector in the training set. Taking advantage of this method, we realized a good analog for classical KNN algorithm by setting a distance threshold value t to select k − nearest neighbors. As a result, QKNN achieves O(n³) performance which is only relevant to the dimension of feature vectors and high classification accuracy, outperforms Llyod’s algorithm (Lloyd et al. 2013) and Wiebe’s algorithm (Wiebe et al. 2014).
Article
Full-text available
Fast and accurate fault classification is essential to power system operations. In this paper, in order to classify electrical faults in radial distribution systems, a particle swarm optimization (PSO) based support vector machine (SVM) classifier has been proposed. The proposed PSO based SVM classifier is able to select appropriate input features and optimize SVM parameters to increase classification accuracy. Further, a time-domain reflectometry (TDR) method with a pseudorandom binary sequence (PRBS) stimulus has been used to generate a dataset for purposes of classification. The proposed technique has been tested on a typical radial distribution network to identify ten different types of faults considering 12 given input features generated by using Simulink software and MATLAB Toolbox. The success rate of the SVM classifier is over 97%, which demonstrates the effectiveness and high efficiency of the developed method.
Article
Full-text available
Office automation is an initiative used to digitally deliver services to citizens, private and public sectors. It is used to digitally collect, store, create, and manipulate office information as a need of accomplishing basic tasks. Azya Office Automation has been implemented as a pilot project in Kurdistan Institution for Strategic Studies and Scientific Research (KISSR) since 2013. The efficiency of governance in Kurdistan Institution for Strategic Studies and Scientific Research has been improved, thanks to its implementation. The aims of this research paper are to evaluate user satisfaction with this software and identify its significant predictors using EGOVSAT Model. The user satisfaction of this model encompasses five main parts, which are utility, reliability, efficiency, customization, and flexibility. For that purpose, a detailed survey is conducted to measure the level of user satisfaction. A total of sixteen questions have distributed among forty one users of the software in KISSR. In order to evaluate the software, three measurements have been used which are reliability test, regression analysis, and correlation analysis. The results indicate that the software is successful to a decent extent based on user satisfaction feedbacks obtained by using EGOVSAT Model. Keywords: Office Automation, e-Organization, User Satisfaction, EGOVSAT Model
Article
Full-text available
Office automation is an initiative used to digitally deliver services to citizens, private and public sectors. It is used to digitally collect, store, create, and manipulate office information as a need of accomplishing basic tasks. Azya Office Automation has been implemented as a pilot project in Kurdistan Institution for Strategic Studies and Scientific Research (KISSR) since 2013. The efficiency of governance in Kurdistan Institution for Strategic Studies and Scientific Research has been improved, thanks to its implementation. The aims of this research paper is to evaluate user satisfaction of this software and identify its significant predictors using EGOVSAT Model. The user satisfaction of this model encompasses five main parts, which are utility, reliability, efficiency, customization, and flexibility. For that purpose, a detailed survey is conducted to measure the level of user satisfaction. A total of sixteen questions have distributed among forty one users of the software in KISSR. In order to evaluate the software, three measurement have been used which are reliability test, regression analysis and correlation analysis. The results indicate that the software is successful to a decent extent based on user satisfaction feedbacks obtained by using EGOVSAT Model.
Article
Full-text available
Measuring the quality of a clustering algorithm has shown to be as important as the algorithm itself. It is a crucial part of choosing the clustering algorithm that performs best for an input data. Streaming input data have many features that make them much more challenging than static ones. They are endless, varying and emerging with high speeds. This raised new challenges for the clustering algorithms as well as for their evaluation measures. Up till now, external evaluation measures were exclusively used for validating stream clustering algorithms. While external validation requires a ground truth which is not provided in most applications, particularly in the streaming case, internal clustering validation is efficient and realistic. In this article, we analyze the properties and performances of eleven internal clustering measures. In particular, we apply these measures to carefully synthesized stream scenarios to reveal how they react to clusterings on evolving data streams using both k-means-based and density-based clustering algorithms. A series of experimental results show that different from the case with static data, the Calinski-Harabasz index performs the best in coping with common aspects and errors of stream clustering for k-means-based algorithms, while the revised validity index performs the best for density-based ones.
Article
Full-text available
As information becomes increasingly sizable for organizations to maintain the challenge of organizing data still remains. More importantly, the ongoing process of analysing incoming data occurs on a continual basis and organizations should employ existing procedures that may not be adequate or efficient when attempting to access specific information to analyse. In these latter days of technological advancement, organizations can offer their customers extensive data resources to utilize and thus accomplish individual objectives and maintain competitiveness; however, it remains a challenge in providing data in a format that serves each client's suited needs. For some, the complexity of a data model can be overwhelming to utilize. Furthermore, companies should secure an understanding of the purchasing power used by specific consumer groups to remain competitive and ease the operation of data analysis. This research paper is to examine the use of multi-dimensional models within a business environment and how it may provide customers and managers with generating queries that will provide accurate and relevant data for effective analysis. It also provides a new framework that can aid various types of organisations using sizable database systems to create their own multidimensional model from relational databases and present the data in multidimensional views. It also defines the requirements. Despite the availability of set tools, the complexity of utilizing the conceptions discourages customers as they may become apprehensive about exploring these options for analytical purposes. This could be done by conducting a query, syntactically accessed returns may produce incorrect information. A key suggestion to the issue may be found by encapsulating a relational schema among defined business terms. In addition, the encapsulation of the query terminology and syntax are related to business articles. Moreover, the outcome is made possible by the business oriented procedure referred to as the Online Analytical Process (OLAP).
Article
Full-text available
Machine learning techniques have been widely used in many scientific fields, but its use in medical literature is limited partly because of technical difficulties. k-nearest neighbors (kNN) is a simple method of machine learning. The article introduces some basic ideas underlying the kNN algorithm, and then focuses on how to perform kNN modeling with R. The dataset should be prepared before running the knn() function in R. After prediction of outcome with kNN algorithm, the diagnostic performance of the model should be checked. Average accuracy is the mostly widely used statistic to reflect the kNN algorithm. Factors such as k value, distance calculation and choice of appropriate predictors all have significant impact on the model performance.
Article
Full-text available
Electronic nose technology is used in many areas, and frequently in the beverage industry for classification and quality-control purposes. In this study, four different aroma data (strawberry, lemon, cherry, and melon) were obtained using a MOSES II electronic nose for the purpose of fruit classification. To improve the performance of the classification, the training phase of the neural network with two hidden layers was optimized using artificial bee colony algorithm (ABC), which is known to be successful in exploration. Test data were given to two different neural networks, each of which were trained separately with backpropagation (BP) and ABC, and average test performances were measured as 60% for the artificial neural network trained with BP and 76.39% for the artificial neural network trained with ABC. Training and test phases were repeated 30 times to obtain these average performance measurements. This level of performance shows that the artificial neural network trained with ABC is successful in classifying aroma data.
Article
Full-text available
There has been recently a growth of interest in developing the current machine-readable Web towards the next generation of machine-understandable Web - Semantic Web. The development of the Web to a global business was reasonably fast, whereas Semantic Web development has taken time from a plan to be used as the mainstream Web. It is also important to note that the use of Semantic Web would only be successful in small technologies. However, the goal of Semantic Web is to be used in big technologies and to be the mainstream Web. Some challenges may impede make further progress of Semantic Web. In this review paper, an overview of the current status and future needs of Semantic Web will be presented. Specifically, the challenges and needs of Semantic Web in the hope of shedding some light on the adoption or infusion of Semantic Web in the future will be discussed. Then, a critical evaluation of these challenges and needs will be presented. Semantic Web has a clear vision. It is moving, in line with this vision, towards overcoming the challenges and usability in real world applications.
Article
Full-text available
Microcosm, Hyper-G, and the Web were developed and released after 1989. There were strengths and weaknesses associate with each of these hypertext systems. The architectures of these systems were relatively different from one another. Standing above its competitors, the Web became the largest and most popular information system. This paper analyses the reasons for which the Web became the first successful hypermedia system by looking and evaluating the architecture of the Web, Hyper-G, and Microcosm systems. Three reasons will be given beyond this success with some lessons to learn. Currently, Semantic Web is a recent development of the Web to provide conceptual hypermedia. More importantly, study of the Web with its impact on technical, socio-cultural, and economical agendas is introduced as web science.
Article
Full-text available
In this work, we present a review of the state of the art of learning vector quantization (LVQ) classifiers. A taxonomy is proposed which integrates the most relevant LVQ approaches to date. The main concepts associated with modern LVQ approaches are defined. A comparison is made among eleven LVQ classifiers using one real-world and two artificial datasets.
Conference Paper
Full-text available
The cluster analysis deals with the problems of organization of a collection of data objects into clusters based on similarity. It is also known as the unsupervised classification of objects and has found many applications in different areas. An important component of a clustering algorithm is the distance measure which is used to find the similarity between data objects. K-means is one of the most popular and widespread partitioning clustering algorithms due to its superior scalability and efficiency. Typically, the K-means algorithm determines the distance between an object and its cluster centroid by Euclidean distance measure. This paper proposes a variant of K-means which uses an alternate distance measure namely, Max-min measure. The modified K-means algorithm is tested with six benchmark datasets taken from UCI machine learning data repository and found that the proposed algorithm takes less number of iterations to converge than the existing one with improved performance.
Article
Full-text available
A measure is presented which indicates the similarity of clusters which are assumed to have a data density which is a decreasing function of distance from a vector characteristic of the cluster. The measure can be used to infer the appropriateness of data partitions and can therefore be used to compare relative appropriateness of various divisions of the data. The measure does not depend on either the number of clusters analyzed nor the method of partitioning of the data and can be used to guide a cluster seeking algorithm.
Conference Paper
Full-text available
Ensemble techniques have been successfully applied in the context of supervised learning to increase the accuracy and stability of classification. Recently, analogous techniques for cluster analysis have been suggested. Research has demonstrated that, by combining a col- lection of dissimilar clusterings, an improved solution can be obtained. In this paper, we examine the potential of applying ensemble clustering techniques with a focus on the area of medical diagnostics. We present several ensemble generation and integration strategies, and evaluate each approach on a number of synthetic and real-world datasets. In addition, we show that diversity among ensemble members is necessary, but not sufficient to yield an improved solution without the selection of an ap- propriate integration method.
Conference Paper
Full-text available
Understanding the results of a multi objective optimization process can be hard. Various visualization methods have been proposed previously, but the only consistently popular one is the 2D or 3D objective scatterplot, which cannot be extended to handle more than 3 objectives. Additionally, the visualization of high dimensional parameter spaces has traditionally been neglected. We propose a new method, based on heatmaps, for the simultaneous visualization of objective and parameter spaces. We demonstrate its application on a simple 3D test function and also apply heatmaps to the analysis of real-world optimization problems. Finally we use the technique to compare the performance of two different multi-objective algorithms.
Article
Full-text available
Clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. Clustering is the process of grouping similar objects into different groups, or more precisely, the partitioning of a data set into subsets, so that the data in each subset according to some defined distance measure. This paper covers about clustering algorithms, benefits and its applications. Paper concludes by discussing some limitations.
Conference Paper
Full-text available
Ensemble techniques have been successfully applied in the context of supervised learning to increase the accuracy and stability of classification. Recently, analogous techniques for cluster analysis have been suggested. Research has demonstrated that, by combining a collection of dissimilar clusterings, an improved solution can be obtained. In this paper, we examine the potential of applying ensemble clustering techniques with a focus on the area of medical diagnostics. We present several ensemble generation and integration strategies, and evaluate each approach on a number of synthetic and real-world datasets. In addition, we show that diversity among ensemble members is necessary, but not sufficient to yield an improved solution without the selection of an appropriate integration method.
Article
Backtracking search optimisation algorithm (BSA) is a commonly used meta-heuristic optimisation algorithm and was proposed by Civicioglu in 2013. When it was first used, it exhibited its strong potential for solving numerical optimisation problems. Additionally, the experiments conducted in previous studies demonstrated the successful performance of BSA and its non-sensitivity toward the several types of optimisation problems. This success of BSA motivated researchers to work on expanding it, e.g., developing its improved versions or employing it for different applications and problem domains. However, there is a lack of literature review on BSA; therefore, reviewing the aforementioned modifications and applications systematically will aid further development of the algorithm. This paper provides a systematic review and meta-analysis that emphasise on reviewing the related studies and recent developments on BSA. Hence, the objectives of this work are two-fold: (i) First, two frameworks for depicting the main extensions and the uses of BSA are proposed. The first framework is a general framework to depict the main extensions of BSA, whereas the second is an operational framework to present the expansion procedures of BSA to guide the researchers who are working on improving it. (ii) Second, the experiments conducted in this study fairly compare the analytical performance of BSA with four other competitive algorithms: differential evolution (DE), particle swarm optimisation (PSO), artificial bee colony (ABC), and firefly (FF) on 16 different hardness scores of the benchmark functions with different initial control parameters such as problem dimensions and search space. The experimental results indicate that BSA is statistically superior than the aforementioned algorithms in solving different cohorts of numerical optimisation problems such as problems with different levels of hardness score, problem dimensions, and search spaces. This study can act as a systematic and meta-analysis guide for the scholars who are working on improving BSA.
Article
Document clustering is a gathering of textual content documents into groups or clusters. The main aim is to cluster the documents, which are internally logical but considerably different from each other. It is a crucial process used in information retrieval, information extraction and document organization. In recent years, the spectral clustering is widely applied in the field of machine learning as an innovative clustering technique. This research work proposes a novel Spectral Clustering algorithm with Particle Swarm Optimization (SCPSO) to improve the text document clustering. By considering global and local optimization function, the randomization is carried out with the initial population. This research work aims at combining the spectral clustering with swarm optimization to deal with the huge volume of text documents. The proposed algorithm SCPSO is examined with the benchmark database against the other existing approaches. The proposed algorithm SCPSO is compared with the Spherical K-means, Expectation Maximization Method (EM) and standard PSO Algorithm. The concluding results show that the proposed SCPSO algorithm yields better clustering accuracy than other clustering techniques.
Article
As a new evolutionary computation method, the structure of backtracking search optimization algorithm (BSA) is simple and the exploration capability of it is strong. However, the global performance of the BSA is significantly affected by mutation strategies and control parameters. Designing appropriate mutation strategies and control parameters is important to improve the global performance of the BSA. In this paper, an adaptive BSA with knowledge learning (KLBSA) is developed to improve the global performance of the BSA. In the method, an adaptive control parameter based on the global and local information of the swarms in the current iteration is designed to adjust the search step length of individuals, which helps to balance the exploration and exploitation abilities of the algorithm. Moreover, a new mutation strategy based on the guidance of different information is designed to improve the optimization ability of the algorithm. In addition, a multi-population strategy is implemented to thoroughly improve the searching ability of the algorithm for different searching areas. To this end, experiments on three groups of benchmark functions and three real-world problems are implemented to verify the performance of the proposed KLBSA algorithm. The results indicate that the proposed algorithm performs competitively and effectively when compared to some other evolutionary algorithms.
Article
Crossover is an important operation in the Genetic Algorithms (GA). Crossover operation is responsible for producing offspring for the next generation so as to explore a much wider area of the solution space. There are many crossover operators designed to cater to different needs of different optimization problems. Despite the many analyses, it is still difficult to decide which crossover to use when. In this article, we have considered the various existing crossover operators based on the application for which they were designed for and the purpose that they were designed for. We have classified the existing crossover operators into two broad categories, namely (1) Crossover operators for representation of applications - where the crossover operators designed to suit the representation aspect of applications are discussed along with how the crossover operators work and (2) Crossover operators for improving GA performance of applications - where crossover operators designed to influence the quality of the solution and speed of GA are discussed. We have also come up with some interesting future directions in the area of designing new crossover operators as a result of our survey.
Article
Data clustering has been proven to be an effective method for discovering structure in medical datasets. The majority of clustering algorithms produce exclusive clusters meaning that each sample can belong to one cluster only. However, most real-world medical datasets have inherently overlapping information, which could be best explained by overlapping clustering methods that allow one sample belong to more than one cluster. One of the simplest and most efficient overlapping clustering methods is known as overlapping k-means (OKM), which is an extension of the traditional k-means algorithm. Being an extension of the k-means algorithm, the OKM method also suffers from sensitivity to the initial points. In this paper, we propose a hybrid method that combines k-harmonic means and overlapping k-means algorithms (KHM-OKM) to overcome this limitation. The main idea behind KHM-OKM method is to use the output of KHM method to initialize the cluster centers of OKM method. We have tested the proposed method using FBCubed metric, which has been shown to be the most effective measure to evaluate overlapping clustering algorithms regarding homogeneity, completeness, rag bag, and cluster size-quantity tradeoff. According to results from ten publicly available medical datasets, the KHM-OKM algorithm outperforms the original OKM algorithm and can be used as an efficient method for clustering medical datasets.
Chapter
Support Vector Machine is one of the classical machine learning techniques that can still help solve big data classification problems. Especially, it can help the multidomain applications in a big data environment. However, the support vector machine is mathematically complex and computationally expensive. The main objective of this chapter is to simplify this approach using process diagrams and data flow diagrams to help readers understand theory and implement it successfully. To achieve this objective, the chapter is divided into three parts: (1) modeling of a linear support vector machine; (2) modeling of a nonlinear support vector machine; and (3) Lagrangian support vector machine algorithm and its implementations. The Lagrangian support vector machine with simple examples is also implemented using the R programming platform on Hadoop and non-Hadoop systems.
Article
This paper describes new dynamic split-and-merge operations for evolving cluster models, which are learned incrementally and expanded on-the-fly from data streams. These operations are necessary to resolve the effects of cluster fusion and cluster delamination, which may appear over time in data stream learning. We propose two new criteria for cluster merging: a touching and a homogeneity criterion for two ellipsoidal clusters. The splitting criterion for an updated cluster applies a 2-means algorithm to its sub-samples and compares the quality of the split cluster with that of the original cluster by using a penalized Bayesian information criterion; the cluster partition of higher quality is retained for the next incremental update cycle. This new approach is evaluated using two-dimensional and high-dimensional streaming clustering data sets, where feature ranges are extended and clusters evolve over time—and on two large streams of classification data, each containing around 500K samples. The results show that the new split-and-merge approach (a) produces more reliable cluster partitions than conventional evolving clustering techniques and (b) reduces impurity and entropy of cluster partitions evolved on the classification data sets.
Article
This paper introduces the Backtracking Search Optimization Algorithm (BSA), a new evolutionary algorithm (EA) for solving real-valued numerical optimization problems. EAs are popular stochastic search algorithms that are widely used to solve non-linear, non-differentiable and complex numerical optimization problems. Current research aims at mitigating the effects of problems that are frequently encountered in EAs, such as excessive sensitivity to control parameters, premature convergence and slow computation. In this vein, development of BSA was motivated by studies that attempt to develop simpler and more effective search algorithms. Unlike many search algorithms, BSA has a single control parameter. Moreover, BSA’s problem-solving performance is not over sensitive to the initial value of this parameter. BSA has a simple structure that is effective, fast and capable of solving multimodal problems and that enables it to easily adapt to different numerical optimization problems. BSA’s strategy for generating a trial population includes two new crossover and mutation operators. BSA’s strategies for generating trial populations and controlling the amplitude of the search-direction matrix and search-space boundaries give it very powerful exploration and exploitation capabilities. In particular, BSA possesses a memory in which it stores a population from a randomly chosen previous generation for use in generating the search-direction matrix. Thus, BSA’s memory allows it to take advantage of experiences gained from previous generations when it generates a trial preparation. This paper uses the Wilcoxon Signed-Rank Test to statistically compare BSA’s effectiveness in solving numerical optimization problems with the performances of six widely used EA algorithms: PSO, CMAES, ABC, JDE, CLPSO and SADE. The comparison, which uses 75 boundary-constrained benchmark problems and three constrained real-world benchmark problems, shows that in general, BSA can solve the benchmark problems more successfully than the comparison algorithms.
Article
Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into a system of ranked taxa: domain, kingdom, phylum, class, etc. Cluster analysis is the formal study of methods and algorithms for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is to find structure in data and is therefore exploratory in nature. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, K-means, was first published in 1955. In spite of the fact that K-means was proposed over 50 years ago and thousands of clustering algorithms have been published since then, K-means is still widely used. This speaks to the difficulty in designing a general purpose clustering algorithm and the ill-posed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semi-supervised clustering, ensemble clustering, simultaneous feature selection during data clustering, and large scale data clustering.
Article
A genetic algorithm-based clustering technique, called GA-clustering, is proposed in this article. The searching capability of genetic algorithms is exploited in order to search for appropriate cluster centres in the feature space such that a similarity metric of the resulting clusters is optimized. The chromosomes, which are represented as strings of real numbers, encode the centres of a fixed number of clusters. The superiority of the GA-clustering algorithm over the commonly used K-means algorithm is extensively demonstrated for four artificial and three real-life data sets.
Conference Paper
The k-means method is a widely used clustering technique that seeks to minimize the average squared distance between points in the same cluster. Although it offers no accuracy guarantees, its simplicity and speed are very appealing in practice. By augmenting k-means with a very simple, randomized seeding technique, we obtain an algorithm that is Θ(logk)-competitive with the optimal clustering. Preliminary experiments show that our augmentation improves both the speed and the accuracy of k-means, often quite dramatically.
Artificial intelligence algorithms for natural language processing and the semantic web ontology learning
  • Hassan
B.A. Hassan, T.A. Rashid, Artificial intelligence algorithms for natural language processing and the semantic web ontology learning, ArXiv Prepr (2021). ArXiv2108.13772.
A Multi-disciplinary Ensemble Algorithm for Clustering Heterogeneous Datasets, Neural Comput
  • Rashid Tarik A.
EBK-means: a clustering technique based on elbow method and k-means in WSN
  • Bholowalia
P. Bholowalia, A. Kumar, EBK-means: a clustering technique based on elbow method and k-means in WSN, Int. J. Comput. Appl. 105 (2014).
  • A Ghosal
  • A Nandy
  • A K Das
  • S Goswami
  • M Panday
  • Short
A. Ghosal, A. Nandy, A.K. Das, S. Goswami, M. Panday, A Short Review on Different Clustering Techniques and Their Applications, in: Emerg. Technol. Model. Graph., Springer, 2020: pp. 69-83.
Artificial Intelligence Algorithms for Natural Language Processing and the Semantic Web Ontology Learning, ArXiv Prepr
  • B A Hassan
  • T A Rashid
B.A. Hassan, T.A. Rashid, Artificial Intelligence Algorithms for Natural Language Processing and the Semantic Web Ontology Learning, ArXiv Prepr. ArXiv2108.13772. (2021).
  • D Lapp
D. Lapp, Heart Disease Dataset, (2019). https://www.kaggle.com/johnsmith88/heart-diseasedataset (accessed October 4, 2020).
UCI machine learning repository
  • S. Forsyth Richard
S. Forsyth Richard, UCI machine learning repository, 1990. http://archive.ics.uci. edu/ml. Mapperley Park, Nottingham NG3 5DX.
UCI machine learning repository
  • L J Rubini
L.J. Rubini, UCI machine learning repository, 2015. http://archive.ics.uci.edu/ml. Mapperley Park, Nottingham NG3 5DX.