Article

Prediction of Breast Cancer using Random Forest, Support Vector Machines and Naïve Bayes

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Machine learning algorithms as tools have been used to create predictive models for BC to support physicians' decisions with acceptable accuracy [17]. However, these models show some limitations, such as the use of appropriate methods to fit the model depending on the dataset without considering feature extraction techniques [18]; proper feature extraction techniques effectively reduce dimensionality for the better prediction of the disease [19]. There is also an increasing concern regarding the methods of handling missing values in the dataset [20]. ...
... Researchers have used a number of methods to impute missing values in datasets that occur due to the incorrect collection of data values. Some of the methods include deletion, mean, imputer method, mode and median [18,32,33]. Deletion causes a great loss of information when missing values are concentrated in a single feature [34]. ...
... (x n ,y n )} RF is a bagging algorithm that aims at regularizing the point where the model quality is high and its variance and basic problems are not compromised. To avoid the problem of overfitting RF, it is customary to build thousands of trees [18]. ...
Article
Full-text available
Breast cancer is a prevalent disease that affects mostly women, and early diagnosis will expedite the treatment of this ailment. Recently, machine learning (ML) techniques have been employed in biomedical and informatics to help fight breast cancer. Extracting information from data to support the clinical diagnosis of breast cancer is a tedious and time-consuming task. The use of machine learning and feature extraction techniques has significantly changed the whole process of a breast cancer diagnosis. This research work proposed a machine learning model for the classification of breast cancer. To achieve this, a support vector machine (SVM) was employed for the classification, and linear discriminant analysis (LDA) was employed for feature extraction. We measured our model’s feature extraction performance in principal component analysis (PCA) and random forest for classification. A comparative analysis of the proposed model was performed to show the effectiveness of the feature extraction, and we computed missing values based on the classifier’s accuracy, precision, and recall. The original Wisconsin Breast Cancer dataset (WBCD) and Wisconsin Prognostic Breast Cancer dataset (WPBC) were used. We evaluated performance in two phases: In phase 1, rows containing missing values were computed using the mean, and in phase 2, rows containing missing values were computed using the median. LDA–SVM when median was used to compute missing values has better results, with accuracy of 99.2%, recall of 98.0% and precision of 98.0% on the WBCD dataset and an accuracy of 79.5%, recall of 76.0% and precision of 59.0% on the WPBC dataset. The SVM classifier had a better performance in handling classification problems when LDA was applied and the median was used as a method for computing missing values.
... A Random Forest Algorithm [13,26] takes the decision tree concept a step further by producing a big number of decision trees to make a forest. These trees are reformed on the basis of selection of data and variables randomly. ...
... SVM [7,9,11,[14][15][16][17]21,23,25,26,30,38,39] is a concept that is used to classify the labeled data and to apply regression analysis on the given dataset. There is a hyper plane to have sets of input data where SVM divides the dataset into two classes as the classification is done on the basis of labeled data in the best possible way. ...
... The Bayes' theorem describes the NB classifier [25,26] with independence assumption between predictors. Bayesian classifiers are statistical classifiers based on probability. ...
... A Random Forest Algorithm [13,26] takes the decision tree concept a step further by producing a big number of decision trees to make a forest. These trees are reformed on the basis of selection of data and variables randomly. ...
... SVM [7,9,11,[14][15][16][17]21,23,25,26,30,38,39] is a concept that is used to classify the labeled data and to apply regression analysis on the given dataset. There is a hyper plane to have sets of input data where SVM divides the dataset into two classes as the classification is done on the basis of labeled data in the best possible way. ...
... The Bayes' theorem describes the NB classifier [25,26] with independence assumption between predictors. Bayesian classifiers are statistical classifiers based on probability. ...
Article
Full-text available
Cancer is a critical disease from many years. This leads to death if it is not diagnosed at early stage. It is a topic of concern because actual treatment of this disease is not found till date. Patients having this disease can only be saved if and only if it is found in early stage (I and II). If it is detected in latter stage (III and IV) then chance of survival is very less. Machine learning and data mining technique is very helpful technique to handle this problem. Machine learning is demonstrating the promise of producing consistently accurate estimates. Machine learning system effectively “learns” how to estimate from training dataset of completed operations. There are various techniques available in Machine Learning to predict the cancer on the basis of collected standard datasets. The datasets may have been recorded by social media, healthcare websites and some other repositories. We need to apply some classifiers of Machine Learning Techniques on these dataset to detect the cancer in a human. The main aim of the review is to help the research on accurate estimation, i.e. to ease other researchers for relevant correct estimation studies using machine-learning techniques. Our review suggests that these techniques are competitive with traditional estimators on datasets and also demonstrate that these methods are sensitive to the data on which they are trained.
... Therefore, six papers were included in this study: three for 2016 and another three for 2017. [36][37][38][39][40][41] The 2018 search produced 5650 papers. Three sample papers were briefly read to verify selection. ...
... and Polat and Senturk 27 (accuracy = 91.37%). The following studies achieved highest accuracy for the WBCD: Abdar and Makarenkov 43 (accuracy = 100%), Elgedawy41 (accuracy = 99.42 %) and Hernández-Julio et al.34 (accuracy = 99.40%).The AUC was the second most commonly used summary measure.Patrício et al.'s 20 study had the best ranking AUC ([87, 91] 95% confidence interval) for the BCCD and the study by Bazazeh and Shubair 37 had the highest AUC (99.90%) for the WBCD. Finally, only Hung et al.'s 25 study reported the F1 score (82%) which was for the BCCD. ...
... As with Abdar's and Makarenkov's 43 study,Elgedawy 41 should have used some form of CV to handle the bias in the dataset. In contrast to Elgedawy,41 Hernández-Julio et al.34 identified the following important WBCD features: Uniformity of Cell Size, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei and Normal Nucleoli. Using these features, Hernández-Julio et al.'s 34 model ranked in third position with an accuracy of 99.40% by 10-fold CV which was created with the MATLAB software. ...
Article
Full-text available
The objective 1 of this study was to investigate trends in breast cancer (BC) prediction using machine learning (ML) publications by analysing country, first author, journal, institutional collaborations and co-occurrence of author keywords. The objective 2 was to provide a review of studies on BC prediction using ML and a blood analysis dataset (Breast Cancer Coimbra Dataset [BCCD]), the objective 3 was to provide a brief review of studies based on BC prediction using ML and patients’ fine needle aspirate cytology data (Wisconsin Breast Cancer Dataset [WBCD]). The design of this study was as follows: for objective 1: bibliometric analysis, data source PubMed (2015-2019); for objective 2: systematic review, data source: Google and Google Scholar (2018-2019); for objective 3: systematic review, data source: Google Scholar (2016-2019). The results showed that the United States of America (USA) produced the highest number of publications (n=803). In total, 2419 first authors contributed towards the publications. Breast Cancer Research and Treatment was the highest ranked journal. Institutional collaborations mainly occurred within the USA. The use of ML for BC screening and detection was the most researched topic. A total of 19 distinct papers were included for objectives 2 and 3. The findings from these studies were never presented to clinicians for validations. In conclusion, the use of ML for BC screening and detection is promising.
... In literature, many various studies about the data mining algorithms usage in cancer analysis have done until this time [16][17][18][19][20][21][22][23][24]. The Random Forest (RF) Algorithm can be used for classification in many areas including Health/Medical. ...
... According to the results of that paper, the K-Nearest Neighbor Algorithm gave results with an accuracy of 90.0%. In the paper of Elgedawy [23], the performance of 3 algorithms including Random Forest, SVM and, Naïve Bayes was used to predict whether the patients' trauma is benign or malignant and compared in terms of accuracy, precision, recall and F-measure. According to the experimental results, Random Forest Algorithm gave the best accuracy result as 99.42% and F-measure result as 0.995, while SVM has the accuracy ratio as 98.8% and Naïve Bayes has the accuracy ratio as 98.24% using the Wisconsin Breast Cancer Database data set. ...
... Also, Random Forests use techniques which assume that the observations in the sample are independent. In the literature [16][17][18][19][20][21][22][23][24][37][38][39][40][41] about cancer detection, tree-based classification algorithms are widely used. RF Algorithm is one of the most successful tree-based classification algorithms. ...
Article
The most common type of cancer for a female is breast cancer in the world. Regular checks and effective-timely treatment are noteworthy parameters for patients’ survival struggle . Against existing imaging methods, microwave imaging method has been considered more powerful and effective method by many researchers. In this paper, comprehensive design equations and parameters of rectangular microstrip patch antenna (RMPA) are given for microwave breast cancer detection. The layered breast model with a spherical tumor that was placed into the fibro-glandular layer was created by using CST Microwave Studio Software, and it was embedded in canola oil to decrease the distorted signals between the transmitting and receiving antennas. The RMPA has a wideband performance from 3 to 18 GHz. The simulation results show that differences in the electric field and reflection coefficients might more efficiently give a possibility to assign the tumor in the breast model. In addition, in this study, the data obtained from these experiments are classified by using the random forest algorithm from the data mining methods. According to the classification result, the random forest algorithm can diagnose breast cancer by classifying the tumor as 94% accuracy.
... A Random Forest Algorithm [13,26] takes the decision tree concept a step further by producing a big number of decision trees to make a forest. These trees are reformed on the basis of selection of data and variables randomly. ...
... SVM [2,[6][7][8]11,[14][15][16][17]21,23,25,26,30,38,39] is a concept that is used to classify the labelled data and to apply regression analysis on the given dataset. There is a hyperplane to have sets of input data where SVM divides the dataset into two classes as the classification is done on the basis of labelled data in the best possible way. ...
... The Bayes' theorem describes the NB classifier [25,26] with independence assumption between predictors. Bayesian classifiers are statistical classifiers based on probability. ...
... A. Support Vector Machine: SVM [13], [14] is a concept that is used to classify the labelled data and to apply regression analysis on the given dataset. There is a hyperplane to have sets of input data where SVM divides the dataset into two classes as the classification is done based on labelled data in the best possible way. ...
... B. Naïve Bayes Classifier: The Bayes theorem [13], [15] describes the NB classifier [25,26] with an independence assumption between predictors. Bayesian classifiers are statistical classifiers based on probability. ...
Article
Cancer is one of the deadliest diseases diagnosed among the population across the globe. The number of cases is increasing each year so are the different diagnostic tools and techniques and technologies. Significant increase in the mortality rate worldwide leads to tremendous scope to devise and implement the latest computer-aided diagnostic systems for early detection. One such technique is machine learning coupled with medical imaging modalities that have proved to be efficient in diagnosing various medical conditions. The current study presents a review of different machine-learning techniques applied to emerging modalities for cancer diagnosis from 2018 to 2022. It was found that traditional machine learning algorithms like SVM, and GMM performed very well in classification. But overall deep learning has dominated the field of medical image analysis. Researchers have achieved 100% accuracy in the classification of cancerous and normal tissue images using medical image analysis with the help of machine learning. This survey presents the studies based on Lymphoma cancer analysis based on HIS systems
... e.g. Model Regression Network [12]. A. Support Vector Machine: SVM [13], [14] is a concept that is used to classify the labeled data and to apply regression analysis on the given dataset. There is a hyper plane to have sets of input data where SVM divides the dataset into two classes as the classi cation is done on the basis of labeled data in the best possible way. ...
... B. Naïve Bayes Classi er: The Bayes theorem [13], [15] describes the NB classi er [25,26] with independence assumption between predictors. Bayesian classi ers are statistical classi ers based on probability. ...
Preprint
Full-text available
Cancer is one of the deadliest diseases diagnosed among the population across the globe. The number of cases is increasing each year so are the different diagnosis tools and techniques and technologies. Significant increase in the mortality rate worldwide leads to tremendous scope to device and implement latest computer aided diagnostic systems for early detection. One such technique is machine learning coupled with medical imaging modalities that have proved to be efficient in diagnosing various medical conditions. Current study presents a review of different machine learning techniques applied on emerging modalities for cancer diagnosis from 2018 to 2022. It was found that traditional machine learning algorithms like SVM, GMM performed very well in classification. But overall deep learning has dominated the field of medical image analysis. Researchers have achieved 100% accuracy in classification of cancerous and normal tissue images using medical image analysis with the help of machine learning. This survey presents the studies based on Lymphoma cancer analysis based on HIS systems
... By using this tool, the results obtained by using SVM are the algorithm that is capable of obtaining the most accurate accuracy (97.13 %) and the least error rate in comparison to all alternative algorithms based on their accuracy and error rates. There were three types of ML techniques used by Madeeh Nayer Elgedawy [26]. These techniques included Nash Bayes, SVM and Regression-Fast regression techniques. ...
Article
Full-text available
Globally, ovarian cancer affects women disproportionately, causing significant morbidity and mortality rates. The early diagnosis of ovarian cancer is necessary for enhancing patient health and survival rates. This research article explores the utilization of Machine Learning (ML) techniques alongside eXplainable Artificial Intelligence (XAI) methodologies to aid in the early detection of ovarian cancer. ML techniques have recently gained popularity in developing predictive models to detect early-stage ovarian cancer. These predictions are made using XAI in a transparent and understandable way for healthcare professionals and patients. The primary aim of this study is to evaluate the effectiveness of various ovarian cancer prediction methodologies. This includes assessing K Nearest Neighbors, Support Vector Machines, Decision trees, and ensemble learning techniques such as Max Voting, Boosting, Bagging, and Stacking. A dataset of 349 patients with known ovarian cancer status was collected from Kaggle. The dataset included a comprehensive range of clinical features such as age, family history, tumor markers, and imaging characteristics. Preprocessing techniques were applied to enhance input data, including feature scaling and dimensionality reduction. A Minimum Redundancy Maximum Relevance (MRMR) algorithm was used to select the features in the model. Our experimental results demonstrate that in Support Vector Machines, we found 85 % base model accuracy and 89 % accuracy after stacking several ensemble learning techniques. With the help of XAI, complex ML algorithms can be given more profound insights into their decision-making, improving their applicability. This paper aims to introduce the best practices for integrating ML and artificial intelligence in biomarker evaluation. Building and evaluating Shapley values-based classifiers and visualizing results were the focus of our investigation. The study contributes to the field of oncology and women's health by offering a promising approach to the early diagnosis of ovarian cancer.
... To overcome the limitations of DTs, a large number of trees are constructed using random samples and replacements. The observations are categorized by each tree, and the final decision is made based on the majority vote of the trees (Elgedawy 2017). ...
Article
Full-text available
Background Breast cancer is a major public health concern, and early diagnosis and classification are critical for effective treatment. Machine learning and deep learning techniques have shown great promise in the classification and diagnosis of breast cancer. Purpose In this review, we examine studies that have used these techniques for breast cancer classification and diagnosis, focusing on five groups of medical images: mammography, ultrasound, MRI, histology, and thermography. We discuss the use of five popular machine learning techniques, including Nearest Neighbor, SVM, Naive Bayesian Network, DT, and ANN, as well as deep learning architectures and convolutional neural networks. Conclusion Our review finds that machine learning and deep learning techniques have achieved high accuracy rates in breast cancer classification and diagnosis across various medical imaging modalities. Furthermore, these techniques have the potential to improve clinical decision-making and ultimately lead to better patient outcomes.
... The model proposed here outperformed other existing strategies in terms of performance. MadeehNayerElgedawy [28] applied Naive Bayes, SVMs, and RFs machine learning techniques. The most suited and successful algorithm among them is RF, which achieves an accuracy of 99.42 percent, NB and SVM, on the other hand, have accuracy scores of 98.24 percent and 98.8 percent, respectively. ...
... The parametric examination of seven different machine learning algorithms is included in this research. The following is a brief summary of the methods used in this paper [22]. ...
Article
Full-text available
Breast cancer is the one common cause of death in both developed worlds and the most death-causing disease diagnosed among women. Early recognition of this condition can help to minimize death rates. The breast problem statement, in brief, is not reliable for accuracy recognition. They have a high degree of classification accuracy as well as diagnostic capabilities. The most common classifications are normal, benign cancer, and malignant cancer. Machine learning (ML) techniques are now widely used in the classification of breast cancer. In this paper, some machine learning technics have been investigated to diagnose breast cancer (BC) on magnetic resonance imaging (MRI) images using multi-step processes. The first step has been to take the MRI image as an input image and have been pre-processing an image, then use feature extraction by using (scale-invariant feature transform (SIFT), histogram of oriented gradient (HOG), local binary patterns (LBP), bag of words (BoW), and edge-oriented histogram (EOH)). Next step we implement the classifying algorithms (KNN, decision tree (DT), naïve Bayes, ANN, SVM, RF, AdaBoost), have been used to detect and classify the normal or breast cancer region for this purpose datasets like ACRIN-Contralateral-Breast-MRI, In and breast cancer MRI dataset) has been collected our breast cancer MRI images from Erbil and Sulaymaniyah hospital the results was 91.9%, the result of ACRIN was 97% and the results Breast Cancer was 92.3%.</span
... It could be very vital to save you this spreading impact via way of means of an analysis of most cancers with inside the early stages the usage of superior strategies and kit. Into the current eras, there remains numerous energies to appoint synthetic intellect & different associated techniques to help with the recognition of most cancers [15] in advanced phases. Initial recognition of most cancers improves the growth of existence hazards by 97.99% [9]. ...
Article
Full-text available
Cancer is the foremost cause behind the most death pace of people around the world. Cancer of breast is the primary reason for mortality among females. There have been various investigation or experimentation aimed at the discovery and interpretation of facts has been done on early expectation and discovery of breast cancer disease to begin treatment and increment the opportunity of endurance. Utmost research targets x-ray pictures of the breasts. Although, photographs of the breasts made by X-rays occasionally produces a threat of fake recognition which can compromise the medical status of infectious person. It’s crucial and import to locate opportunity techniques that might be simpler to put into effect and work with extraordinary records sets, inexpensive and safer, which could produce an extra dependable prognosis. This research journal recommends an associated prototype of numerous DLA (Deep Learning Algorithms) including ANN (Artificial Neural Network) and CNN (Convolutional Neural Networks) for efficient breast cancer detection and prediction. The research exploration utilizes the x-rays image database (as base research datasets) for prediction, detection, and diagnosis of breast cancer. This anticipated research prototype may be associated with several clinical examination data i.e. text, audio, image, video, blood, urine and many more.
... Every tree categorizes its observations, and the choice is made based on the votes of the majority of the trees (Elgedawy, 2017). ...
Article
Full-text available
Breast cancer is one of the most dangerous diseases and the second largest cause of women cancer death. Techniques and methods have been adopted for early indications of the disease signs as it's the only effective way of managing breast cancer in women. This review explores the techniques used for breast cancer in Computer-Aided Diagnosis (CAD) using image analysis, deep learning and traditional machine learning. It primarily gives an introduction to the various strategies of machine learning, followed by an explanation of the various deep learning techniques and particular architectures for breast cancer detection and their classification. After the review, the researcher recommended the need for the inclusion of deep learning in machine learning because it performs multi-functions in enabling medical diagnosis. Also, it is important to involve the integration of more than learning methods in medical learning to improve the process of medical diagnostic imaging and their benefits and limitations, recent advancements and development are discussed by reviewing the existing secondary sources. This study reviews papers published from 2015 (early publications on breast cancer) to 2021. This paper is a review of the latest works and techniques have done in the field with the future trends and problems in breast cancer categorization and diagnosis.
... To forecast the development of breast cancer, this research presents a methodology based on the supervised machine learning algorithms Decision Tree (DT) [16], K-Nearest Neighbour (K-NN) [15], Naive Bayes (NB) and Support Vector Machine (SVM) [17] [18]. In this study, the big amount of data is critical. ...
Chapter
Cancer is a fatal disease that is constantly changing and affects a vast number of individuals worldwide. At the research level, much work has gone into the creation and improvement of techniques built on data mining approaches that allow for the early identification and prevention of breast cancer. Because of its excellent diagnostic abilities and effective classification, data mining technologies have a reputation in the medical profession that is continually increasing. Data mining and machine learning approaches can aid practitioners in conceiving and developing tools to aid in the early detection of breast cancer. As a result, the goal of this research is to compare different machine learning algorithms in order to determine the best way for detecting breast cancer promptly. This study assessed the classification accuracy of four machine learning algorithms: KNN, Decision Tree, Naive Bayes, and SVM in order to find the best accurate supervised machine learning algorithm that might be used to diagnose breast cancer. Naive Bayes has the maximum accuracy for the supplied dataset, according to the prediction results. This reveals that, when compared to KNN, SVM, and Decision Tree, Naive Bayes can be utilized to predict breast cancer. KeywordsBreast cancerMachine learningData miningAlgorithmClassification
... A good number of research works have reported the use of machine learning algorithms for breast cancer predictions [9]. However, these models show some limitations such as the issue of choosing an appropriate method to fit the model, without consideration to include feature extraction techniques [9], inability to choose proper feature extraction techniques to effectively reduce the dimensionality for better prediction of the disease [10]. Some of the feature extraction techniques have limited ability in handling large data which could lead to overfitting and handling of missing values [11], [12]. ...
Conference Paper
Full-text available
With the recent advances in clinical technologies, a huge amount of data has been accumulated for breast cancer diagnosis. Extracting information from the data to support the clinical diagnosis of breast cancer is a tedious and time-consuming task. The use of machine learning and data mining techniques has significantly changed the whole process of a breast cancer diagnosis. In this research, a prediction model for breast cancer prediction has been developed using features extracted from individual medical screening and tests. To overcome the problem of overfitting and obtain a good prediction accuracy, a Linear Discriminant Analysis (LDA) is applied for the extraction of useful features. This is done to reduce the number of features in the experimental dataset. The proposed model can create new features from the existing features and then get rid of the original features. The newly created features were able to summarize the necessary information contained initially in the original set of features. LDA was chosen because of its usefulness in detecting whether a set of features is worthwhile in predicting breast cancer. In addition to LDA, the proposed model uses Support Vector Machine (SVM) for accurate prediction, hence the name LDA-SVM prediction model. Based on 5-fold cross-validation, the proposed model yields an accuracy of 99.2%, precision of 98.0%, and Recall of 99.0% when tested on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset from the University of California- Irvine machine learning repository. Therefore, SVM shows high efficiency in handling classification problems when combined with feature extraction techniques.
... In earlier studies, the classification of breast cancer images centralized on traditional machine learning methods such as Support Vector Machine (SVM) [41][42][43], Naïve Bayes [44][45][46], and Random Forest [47,48]. Machine learning involves the algorithms design and deployment to assess data and corresponding attributes without any prior task based on predetermined inputs from the environment [49]. ...
Article
Full-text available
Medical imaging is gaining significant attention in healthcare, including breast cancer. Breast cancer is the most common cancer-related death among women worldwide. Currently, histopathology image analysis is the clinical gold standard in cancer diagnosis. However, the manual process of microscopic examination involves laborious work and can be misleading due to human error. Therefore, this study explored the research status and development trends of deep learning on breast cancer image classification using bibliometric analysis. Relevant works of literature were obtained from the Scopus database between 2014 and 2021. The VOSviewer and Bibliometrix tools were used for analysis through various visualization forms. This study is concerned with the annual publication trends, co-authorship networks among countries, authors, and scientific journals. The co-occurrence network of the authors’ keywords was analyzed for potential future directions of the field. Authors started to contribute to publications in 2016, and the research domain has maintained its growth rate since. The United States and China have strong research collaboration strengths. Only a few studies use bibliometric analysis in this research area. This study provides a recent review on this fast-growing field to highlight status and trends using scientific visualization. It is hoped that the findings will assist researchers in identifying and exploring the potential emerging areas in the related field.
... Coudray et al. (2018) used a model deep convolutional neural network (inception v3) on wholeslide images obtained from The Cancer Genome Atlas (TCGA) and obtained an improved AUC score from 0.733 to 0.856 in the detection of cancer subtype. Elgedawy et al. (2017) applied 3 machine learning techniques: Naïve Bayes, SVM and RF. Out of them RF is the most appropriate and useful algorithm to give the best accuracy as 99.42% where SVM and NB produced 98.8% and 98.24% accuracy respectively. ...
... RF algorithm is used at the regularization point where the model quality is highest, variance and bias problems are compromised [14]. RF builds numerous numbers of DTs using random samples with a replacement to overcome the problem of DTs. ...
Article
Full-text available
Cancer is the second cause of death in the world. 8.8 million patients died due to cancer in 2015. Breast cancer is the leading cause of death among women. Several types of research have been done on early detection of breast cancer to start treatment and increase the chance of survival. Most of the studies concentrated on mammogram images. However, mammogram images sometimes have a risk of false detection that may endanger the patient's health. It is vital to find alternative methods which are easier to implement and work with different data sets, cheaper and safer, that can produce a more reliable prediction. This paper proposes a hybrid model combined of several Machine Learning (ML) algorithms including Support Vector Machine (SVM), Artificial Neural Network (ANN), K-Nearest Neighbor (KNN), Decision Tree (DT) for effective breast cancer detection. This study also discusses the datasets used for breast cancer detection and diagnosis. The proposed model can be used with different data types such as image, blood, etc.
Article
Full-text available
Breast cancer is one of the most prevalent and chronic disease that affect women. To overcome this disease, effective medical treatment is required. Early detection of the disease plays an important role for suitable medication and survival of patient. To identify the breast cancer in the patients, standard imaging technique mammography is used. Due to the subtle and varied nature of cancer tissues interpreting mammogram images can be a challenge to doctors. Machine learning (ML) and Deep Learning (DL) techniques offer promising solutions that provide efficient breast cancer detection from mammograms. In this review paper a comprehensive review of ML and DL algorithms and their applications in mammogram image analysis are presented. Various supervised and unsupervised learning techniques, such as convolutional neural networks (CNNs), support vector machines (SVMs), random forests, and other popular ML and DL models are discussed in paper. The integration of these DL methods that are efficiently used in image preprocessing techniques, feature extraction, and classification strategies. The overall survey focusses on various performance metrics, datasets, and benchmarks used in existing studies. Further the strengths and limitations of different approaches used by various researchers are identified. By understating current research trends this paper aims to contribute to the ongoing development of more accurate and reliable breast cancer detection systems using advanced ML techniques.
Article
Full-text available
Breast Cancer is the reason forthe most mortality rate of women worldwide.Breast cancer is the primary reason for mortality among females. There have been numerous researches done on early prediction and detection of breast cancer to start treatment and increase the chance of survival. Most of the research targets mammogram pictures. However, mammogram pictures occasionally have a threat of fake detection which can endanger the patient`s health. It is crucial to locate opportunity techniques that might be simpler to put into effect and work with extraordinary records sets, inexpensive and safer, which could produce an extra dependable prediction. This research paper proposes a combined prototype of numerous Deep Learning (DL) algorithms including Artificial Neural Network (ANN) and Convolutional Neural Networks (CNN) for efficient breast cancer prediction and detection. This research study also describes the image datasets used for breast cancer prediction, detection, and diagnosis. The proposed research prototype can be used with several data types such as text, audio, image, video, blood, urine, etc.
Article
Full-text available
Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison of twelve feature selection methods (e.g. Information Gain) evaluated on a benchmark of 229 text classification problem instances that were gathered from Reuters, TREC, OHSUMED, etc. The results are analyzed from multiple goal perspectives-accuracy, F-measure, precision, and recall-since each is appropriate in different situations. The results reveal that a new feature selection metric we call 'Bi-Normal Separation' (BNS), outperformed the others by a substantial margin in most situations. This margin widened in tasks with high class skew, which is rampant in text classification problems and is particularly challenging for induction algorithms. A new evaluation methodology is offered that focuses on the needs of the data mining practitioner faced with a single dataset who seeks to choose one (or a pair of) metrics that are most likely to yield the best performance. From this perspective, BNS was the top single choice for all goals except precision, for which Information Gain yielded the best result most often. This analysis also revealed, for example, that Information Gain and Chi-Squared have correlated failures, and so they work poorly together. When choosing optimal pairs of metrics for each of the four performance goals, BNS is consistently a member of the pair-e.g., for greatest recall, the pair BNS + F1-measure yielded the best performance on the greatest number of tasks by a considerable margin.
Article
Full-text available
The Wisconsin Breast Cancer Epidemiology Simulation Model is a discrete-event, stochastic simulation model using a systems-science modeling approach to replicate breast cancer incidence and mortality in the U.S. population from 1975 to 2000. Four interacting processes are modeled over time: (1) natural history of breast cancer, (2) breast cancer detection, (3) breast cancer treatment, and (4) competing cause mortality. These components form a complex interacting system simulating the lives of 2.95 million women (approximately 1/50 the U.S. population) from 1950 to 2000 in 6-month cycles. After a “burn in” of 25 years to stabilize prevalent occult cancers, the model outputs age-specific incidence rates by stage and age-specific mortality rates from 1975 to 2000. The model simulates occult as well as detected disease at the individual level and can be used to address “What if?” questions about effectiveness of screening and treatment protocols, as well as to estimate benefits to women of specific ages and screening histories.
Decision tree induction systems: a Bayesian analysis Machine learning: an algorithmic perspective
  • W L S Buntine
Buntine, W. L. (2013). Decision tree induction systems: a Bayesian analysis. arXiv preprint arXiv:1304.2732.‫‏‬ [23] Marsland, S. (2015). Machine learning: an algorithmic perspective. CRC press.‫‏‬
Mining Big Data: Breast Cancer Prediction using DT-SVM Hybrid Model
  • K Sivakami
Sivakami, K. (2015). Mining Big Data: Breast Cancer Prediction using DT-SVM Hybrid Model. International Journal of Scientific Engineering and Applied Science, 1(5).