Conference Paper

Machine learning approaches for breast cancer diagnosis and prognosis

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... According to the World Health Organization 2020 cancer report, breast cancer ranks fourth in the world and second in Turkey. Many researchers have been studied machine learning methods in different domains and approaches since they can be automatically trained and improved with the training datasets (Cinarer & Emiroglu, 2020;Ganggayah et al., 2019;Sharma et al., 2017;Solanki et al., 2021;Wang et al., 2018;Zheng et al., 2014). Feature selection is an important phase in choosing informative features to improve classification performances and speed up training computation time. ...
... There are many studies for predicting Breast Cancer (Sharma et al., 2017;Solanki et al., 2021) . These studies apply various machine learning methods such as Decision Tree, Naive Bayes, Support Vector Machines, Random Forest and Neural Networks. ...
... In (Sharma et al., 2017) the authors apply three machine learning algorithms namely logistic regression, nearest neighbor, support vector machines to classify features of Wisconsin dataset. Their method achieves with accuracies between 93% and 97%. ...
Article
Full-text available
One out of every six deaths in the world is caused by cancer. Breast cancer is the most common type of cancer. The main purpose of this study is to investigate the classification performances of three machine-learning algorithms, namely Logistic regression, multilayer perceptron and random forest using Wolf Search Algorithm (WSA), which is a bio-inspired feature selection method for predicting breast cancer. Feature selection methods are used to remove uninformative and noisy features to improve the classification accuracy. Experiments are conducted on Wisconsin Diagnostic Breast Cancer Database using ten-fold cross validation. The experimental results from the Wisconsin Diagnostic Breast Cancer dataset show that multilayer perception is the best algorithm with 96.31% accuracy. For the reduced set of features using WSA, the accuracy is increased to 97.01%. We can conclude that the classification accuracy is the highest with an artificial neural network-based classifier by choosing reduced size of features with WSA.
... For image data, data augmentation can be used to increase the data set size. Ayush Sharma [4] compares the ML algorithms to predict the breast cancer tumor using Wisconsin Prognostic dataset. Logistic Regression gives the highest accuracy with 96.89%. ...
... The result will be either Class 0 which indicates the benign tumor and class 1 indicates the malignant tumor. Finally, the results obtained from various ML algorithms were compared with respect to accuracy given in (4). ...
Article
Full-text available
Machine Learning (ML), provides system the capacity to learn instinctively and allows systems to improve themselves with past experience and without being programmed specifically. In the field of Medical Science, ML plays important role. ML is being used to develop new practices in medical science which deals with huge patient data. Breast Cancer is a chronic disease commonly diagnosed in women. According to the survey by WHO, rank of breast cancer is at number one as compared to other cancers in female. BC has two kinds of tumour: Benign Tumour (BT), and Malignant Tumour (MT). BTs are treated as non-cancerous cells. MTs are treated as cancerous cells. The unidentified MTs in time stretch to other organs. Treatment procedure for BT and MT is different. So, it is salient to determine precisely whether a tumour is BT or MT. In this proposed model, Histopathology Images are used as dataset. These Histopathology images are pre-processed using Gaussian Blur and K-means Segmentation. The pre-processed data fed into feature extraction model. ML algorithms such as Support Vector Machine (SVM), Random Forest (RF) and Convolution Neural Network (CNN) are applied to extracted features. Performance of these algorithms is analysed using accuracy, precision, recall and F1-score. CNN gives the highest accuracy with 87%.
... In the last few decades, several data mining and machine-learning techniques have been developed for breast cancer detection and classification [5][6][7]. ese approaches can be divided into three main stages: preprocessing, feature extraction, and classification. ...
... Sharma et al. [7] applied three machine-learning algorithms to predict breast cancer. e experimental results gave an accuracy training ranging from 93% to 97%. ...
Article
Full-text available
Breast cancer is the most diagnosed cancer among women around the world. The development of computer-aided diagnosis tools is essential to help pathologists to accurately interpret and discriminate between malignant and benign tumors. This paper proposes the development of an automated proliferative breast lesion diagnosis based on machine-learning algorithms. We used Tabu search to select the most significant features. The evaluation of the feature is based on the dependency degree of each attribute in the rough set. The categorization of reduced features was built using five machine-learning algorithms. The proposed models were applied to the BIDMC-MGH and Wisconsin Diagnostic Breast Cancer datasets. The performance measures of the used models were evaluated owing to five criteria. The top performing models were AdaBoost and logistic regression. Comparisons with others works prove the efficiency of the proposed method for superior diagnosis of breast cancer against the reviewed classification techniques.
... The author compared many algorithms like Discriminant Analysis, MLP, Decision Tree, Logistic Regression, SVM, NB, KNN, and chosen SVM is the best performer. Sharma et al. (2017) discussed ML approaches for breast cancer diagnosis and prognosis using WBCD dataset and compared three supervised ML techniques such as Logistic Regression, KNN, and SVM on the both the datasets WBCD and WPBC. ...
... Apart from population-based or clinical and molecular data, studies that have utilized ANN to determine its accuracy in predicting individuals' risk for breast cancer. The results suggest that ANN is capable of discriminating among malignant and benign findings in relation to breast cancer; especially when ten-fold cross-validation is used to estimate the performance of ANN (Senturk, Z.K., Kara, R., 2014, Sharma, A., Kulshrestha, S., Daniel, S., 2017. It has been established that ANN exhibits superior performance in predicting breast cancer risk. ...
Conference Paper
Full-text available
To improve the quality of life and increase the survival rate of individuals with cancer, early treatment and detection plays a crucial role. Early detection and diagnosis pose almost 100 percent survival rate, especially before or during stage-I. However, situations, where the detection comes during stage-IV, the rate of survival is as low as 30 percent. The quest to foster early detection has paved the way for the evolution of machine learning techniques and have emerged in response to cancer’s big data. This paper has examined Machine Learning (ML) predictive models that have been applied in the early detection of cancer, providing some of the benefits and drawbacks with which they are associated. From the results documented in most of the current literature, the ML techniques pose remarkable improvements in the prediction and classification accuracy of cancer. The implication is that in future, healthcare systems ought to combine various ML techniques with multidimensional heterogeneous data to produce more accurate results regarding cancer prediction and classification.
... The aim is to study the significance of ML in disease diagnostics, prediction and ML-based healthcare applications [6]. Sharma et al. [7] conducted ML approaches for breast cancer diagnosis and prognosis. The study proposed has achieved training accuracies ranging from 93-97% from trivia models such as Support Vector Machines, Logistic Regression and Nearest Neighbour classifier [7]. ...
... Sharma et al. [7] conducted ML approaches for breast cancer diagnosis and prognosis. The study proposed has achieved training accuracies ranging from 93-97% from trivia models such as Support Vector Machines, Logistic Regression and Nearest Neighbour classifier [7]. ...
Conference Paper
Healthcare has become a big billion business in recent years. Healthcare applications still do not fit into many of the existing drawbacks in contexts such as non-availability of treatment, doctors and other resources due to that many people lose their lives. In this paper, we have identified the potentials in using Machine Learning (ML) for the development of robust healthcare system. It is believed that inherent capabilities of ML such as learning from experience, independency from human intervention etc., it can play a vital role. This paper clubs how ML has been applied in various healthcare system and approaches of the same as per available literature. This survey would benefit the healthcare development community to a greater extent.
... However, the clustering algorithm only achieved 68% accuracy. In [31] they study the Wisconsin Breast Cancer (Original) data set [32], they used min-max normalization for feature scaling and built classifiers using K-nearest neighbor (KNN), SVM, and Logistic Regression. Their results showed a training accuracy ranging from 93% to 97%. ...
Article
Full-text available
Cancer is one of the diseases that kill the most women in the world, with breast cancer being responsible for the highest number of cancer cases and consequently deaths. However, it can be prevented by early detection and, consequently, early treatment. Any development for detection or perdition this kind of cancer is important for a better healthy life. Many studies focus on a model with high accuracy in cancer prediction, but sometimes accuracy alone may not always be a reliable metric. This study implies an investigative approach to studying the performance of different machine learning algorithms based on boosting to predict breast cancer focusing on the recall metric. Boosting machine learning algorithms has been proven to be an effective tool for detecting medical diseases. The dataset of the University of California, Irvine (UCI) repository has been utilized to train and test the model classifier that contains their attributes. The main objective of this study is to use state-of-the-art boosting algorithms such as AdaBoost, XGBoost, CatBoost and LightGBM to predict and diagnose breast cancer and to find the most effective metric regarding recall, ROC-AUC, and confusion matrix. Furthermore, previous studies have applied Optuna to individual algorithms like XGBoost or LightGBM, but no prior research has collectively examined all four boosting algorithms within a unified Optuna framework, a library for hyperparameter optimization, and the SHAP method to improve the interpretability of our model, which can be used as a support to identify and predict breast cancer. We were able to improve AUC or recall for all themodels and reduce the False Negative for AdaBoost and LigthGBM the final AUC were more than 99.41% for all models.
... The extensive literature review conducted in this study identified that there are very few studies focusing on research related to the application of ML and AI techniques in breast cancer diagnosis and screening in Saudi Arabia. Zain et al. [44][45][46] studied the effectiveness of various ML algorithms including NB, K-NN, and fast decision tree (REPTree) for predicting breast cancer recurrence and found that K-NN produced a better prediction without principal component analysis (F-measure = 72.1%). Similarly, Sultana et al. [47], using SVM and Multi-Classifiers, identified that SVM offered a high accuracy and F-score in comparison with multi-classifiers. ...
Article
Full-text available
Breast cancer represents a significant health concern, particularly in Saudi Arabia, where it ranks as the most prevalent cancer type among women. This study focuses on leveraging eXplainable Artificial Intelligence (XAI) techniques to predict benign and malignant breast cancer cases using various clinical and pathological features specific to Saudi Arabian patients. Six distinct models were trained and evaluated based on common performance metrics such as accuracy, precision, recall, F1 score, and AUC-ROC score. To enhance interpretability, Local Interpretable Model-Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) were applied. The analysis identified the Random Forest model as the top performer, achieving an accuracy of 0.72, along with robust precision, recall, F1 score, and AUC-ROC score values. Conversely, the Support Vector Machine model exhibited the poorest performance metrics, indicating its limited predictive capability. Notably, the XAI approaches unveiled variations in the feature importance rankings across models, underscoring the need for further investigation. These findings offer valuable insights into breast cancer diagnosis and machine learning interpretation, aiding healthcare providers in understanding and potentially integrating such technologies into clinical practices.
... At present, some machine learning algorithms have been used for early diagnosis or prediction of cancer [2][3][4]. For example, Hosseinpour et al. [5] achieved superior performance by predicting overall breast cancer risk through the improved random forest algorithm. ...
Chapter
Full-text available
Some disease datasets have different degrees of missing, which will lead to the problem of low classification accuracy. To improve the effectiveness of breast cancer disease detection and diagnosis, a classification prediction method combining KNNI and XGBoost was proposed and applied to the classification and analysis of breast cancer data. First, the KNNI method is used to impute the missing data in the breast cancer patient dataset; Then, the original dataset is equalized by the SMOTE oversampling method; Finally, XGBoost is used to extract features that are strongly related to breast cancer malignancies as the input of the model, optimize the XGBoost model by grid search algorithm, find the optimal model parameters, and classify and diagnose the breast cancer dataset. The experimental results show that KNNI can effectively recover the lost data, improve the data quality, and improve the subsequent classification accuracy. Applying imputation methods to flexibly apply missing data to machine learning methods holds great promise.
... In this context, Ereken et al. [6] have explored the usage of convolutional neural networks to detect malignant tumors from mammographic images and have obtained promising recall results, reaching 84%. Sharma et al. [7] have tested multiple machine learning algorithms, such as support vector machine (SVM), k-nearest neighbors (K-NN), and logistic regression, for breast cancer diagnosis. They have demonstrated their effectiveness in improving diagnostic results, with a better accuracy of 96.89% for logistic regression. ...
Conference Paper
Breast cancer represents the preeminent widespread type of cancer worldwide among women. The World Health Organization estimates that an annual total of 2.3 million new breast cancer cases are recorded. Also, breast cancer stands as the top cause of cancer mortality in the female population, claiming more than 685,000 lives by 2020. In response to the alarming spread of breast cancer and its significant impact on women's health, it has become imperative to develop innovative techniques and methods for early detection, accurate diagnosis, and effective treatment. In this perspective, the current paper suggests a comparison of several machine learning methods enhanced with data balancing, feature selection, and hyperparameter-tuning Bayesian search strategies. The dataset employed is an unbalanced set of 569 entries comprising 31 medical features associated with breast cancer. With machine learning, data balancing, feature selection, and hyperparameter optimization methods, we can make significant strides in improving the accuracy of breast cancer classification and prediction techniques. All models in our study demonstrated promising performances, exceeding 98% across all classification metrics for some of them, which will improve breast cancer diagnosis and treatment systems and offer healthcare professionals more practical resources.
... Convolutional neural networks (CNNs) [9], a subfield of machine learning, and artificial intelligence (AI) are some of the healthcare industry's hottest new trends. Artificial intelligence (AI) [10] and machine learning (ML) may be found in the field of study that focuses on developing better technological systems to handle complicated tasks with less reliance on human intellect [11]. ...
Article
Full-text available
Breast cancer is a common cause of female mortality in developing countries. Early detection and treatment are crucial for successful outcomes. Breast cancer develops from breast cells and is considered a leading cause of death in women. This disease is classified into two subtypes: invasive ductal carcinoma (IDC) and ductal carcinoma in situ (DCIS). The advancements in artificial intelligence (AI) and machine learning (ML) techniques have made it possible to develop more accurate and reliable models for diagnosing and treating this disease. From the literature, it is evident that the incorporation of MRI and convolutional neural networks (CNNs) is helpful in breast cancer detection and prevention. In addition, the detection strategies have shown promise in identifying cancerous cells. The CNN Improvements for Breast Cancer Classification (CNNI-BCC) model helps doctors spot breast cancer using a trained deep learning neural network system to categorize breast cancer subtypes. However, they require significant computing power for imaging methods and preprocessing. Therefore, in this research, we proposed an efficient deep learning model that is capable of recognizing breast cancer in computerized mammograms of varying densities. Our research relied on three distinct modules for feature selection: the removal of low-variance features, univariate feature selection, and recursive feature elimination. The craniocaudally and medial-lateral views of mammograms are incorporated. We tested it with a large dataset of 3002 merged pictures gathered from 1501 individuals who had digital mammography performed between February 2007 and May 2015. In this paper, we applied six different categorization models for the diagnosis of breast cancer, including the random forest (RF), decision tree (DT), k-nearest neighbors (KNN), logistic regression (LR), support vector classifier (SVC), and linear support vector classifier (linear SVC). The simulation results prove that our proposed model is highly efficient, as it requires less computational power and is highly accurate.
... SVM, KNN, and ANN are only a few of the techniques used to predict breast cancer. KNN, SVM, and logistic regression classifiers were used by Ayush Sharma [3] to detect breast cancer. ...
... In a review by Yassin et al. [51] commonly used conventional ML algorithms/classification techniques employed in the recent past for breast cancer diagnosis are discussed. These algorithms include Decision Tree (DT) [52], Random Forest (RF), Support Vector Machines (SVM) [53], Naive Bayes (NB), K-Nearest Neighbor (KNN) [54], Linear Discriminant Analysis (LDA), and Logistic Regression (LR) [55,56]. ...
Article
Breast cancer is one of the leading causes of death among women across the globe. It is difficult to treat if detected at advanced stages. However, early detection can significantly increase chances of survival and improves the lives of millions of women. Given the widespread prevalence of breast cancer, it is of utmost importance for the research community to provide comprehensive framework encompassing early detection, classification, and diagnosis. The artificial intelligence research community, in coordination with medical practitioners, is developing such frameworks to automate the task of detection. With the surge in research activities coupled with the availability of large datasets and enhanced computational powers, it is expected that AI framework results will help even more clinicians in making correct predictions. In this article, a novel framework for the classification of breast cancer using mammograms is proposed. The proposed framework not only uses features extracted from Convolutional Neural Network (CNN) but also combines these features with handcrafted features (Histogram of Oriented Gradients (HOG) and Local Binary Pattern (LBP)) which helps in embedding domain expert knowledge, a step towards making proposed framework transparent. Experimental results conducted on the CBIS-DDSM dataset demonstrate that the proposed framework outperforms the current state-of-the-art methods in breast cancer classification.
... DL is a machine-learning technique that may be utilized for automated training and selection from datasets containing attributes related to breast cancer (Alzubi et al. 2020). Numerous researchers have utilized the WPBCC and WDBC dataset standards for performance evaluations for both DL-and ML-based models over time (Aamir et al. 2022;Sharma et al. 2017). ML has been used to apply a range of classification techniques, including Naive Bayes (NB), Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and others, in numerous research on the diagnosis of breast cancer. ...
Article
Full-text available
PurposeOne of the leading causes of death among women is breast cancer. However, it has been established that early diagnosis with accurate results can ensure the prolonged survival of patients even with the illness. Deep learning (DL) and expert systems have been proven beneficial and gaining popularity in breast cancer diagnosis because of their effective taxonomy and high diagnostic capability.Method This paper proposes a DL-based breast cancer model empowered with a rule-based hybrid feature selection mechanism to remove irrelevant features, thus proving to be a catalyst for improving diagnostic accuracy. The DL-based enabled feature selection helps in key attributes that are relevant to the diagnosis of breast cancer. The model has been tested utilizing the well-known Wisconsin Breast Cancer Dataset (WBCD) and validated through performance measures such as accuracy, sensitivity, specificity, F-score, and ROC curves.ResultsThe experimental results revealed that the DL-based enabled with feature selection performed excellently when compared with existing models on breast cancer using the same dataset. The findings show a greater diagnostic accuracy of 99.5% and detect five insightful features with a significant clue for better diagnosis.Conclusion The proposed model can predict the presence of breast cancer by identifying the most relevant features in the diagnosis of breast cancer. The system looks promising when compared to other existing models for breast cancer.
... An Improved Instance-Based K-Nearest Neighbour The time complexity of Naïve Bayes, logistic regression and decision tree is analyzed using the breast cancer dataset. Logistic regression performs better than the other classi ers with the highest accuracy (Mandal, 2017 (Sharma et al. 2017). Decision Tree based model evaluation is performed for breast cancer dataset using data mining approaches (Ponnuraja, 2017). ...
Preprint
Full-text available
BACKGROUND: Breast cancer is the commonest type of cancer in women worldwide and the leading cause of mortality for females. Despite the fact that many breast cancer patients have no family members who have also had the disease. Women who have it are more at risk than those who don't. OBJECTIVE: The aim of this research is to classify the death status of breast cancer patients using the Surveillance, Epidemiology, and End Results (SEER) dataset. Due to its capacity to handle enormous data sets systematically, machine learning has been widely employed in biomedical research to answer diverse classification difficulties. Pre-processing data enables its visualization and analysis for use in making important decisions. METHODOLOGY: This research presents a feasible machine learning-based approach for categorizing datasets related to breast cancer. Moreover, a two-step feature selection method based on Variance Threshold and Principal Component Analysis (PCA) was employed to select the features from the SEER breast cancer dataset. After selecting the features, the classification of the breast cancer dataset is carried out using Supervised and Ensemble learning techniques such as Ada Boosting (AB), XG Boosting (XGB), and Gradient Boosting (GB), as well as binary classification techniques such as Naive Bayes (NB) and Decision Tree (DT). RESULTS:In this study, it is observed that the Decision Tree algorithm showed better results than other algorithms used in this analysis (AB, XGB, GB & NB). The accuracy of DT for both train-test split and cross validation achieved as 98%. CONCLUSION: Utilizing the train-test split and k-fold cross-validation approaches, the performance of various machine learning algorithms is examined. The Decision Tree algorithm outperforms other supervised and ensemble learning approaches, according to the experimental data.
... Topographies such as lump radius, feel and fractal sizes are publicized by this technique. By dataset from Wisconsin Breast Tumor Information using sophisticated classifiers forecast breast tumor is the goal of this article [12]. ...
Article
Full-text available
As the huge volume of healthcare data was being unused, recent researchers were focused on predicting the many diseases by analyzing the past patient records. In continuation with that, there are lot of researches focused on predicting the tumor on the human body. In this research, two widely used classification algorithms called Naïve Bayes and Random tree were considered for implementation and analysis with the UCI Machine learning Tumor data set. The data cleaning technique called “Replace Missing Values” in the WEKA tool has been considered for cleaning the data. The implementation has been done with the original dataset and the cleaned dataset. Finally, it is found that the Random tree algorithm is performed well with improved accuracy and reduced error rate. The accuracy obtained before data cleaning is 90.8333% and after data cleaning is 93.3333 %. Similarly, the error rates were reduced reasonably and they are 9.1667 % before data cleaning and 93.3333 % after data cleaning. In future, the data cleaning techniques has to be tuned well to improve the accuracy further.
... In a review by Yassin et al. [45] commonly used conventional ML algorithms / classification techniques employed in the recent past for the breast cancer diagnosis are discussed. These algorithms include Decision Tree (DT) [46], Random Forest (RF), Support Vector Machines (SVM) [47], Naive Bayes (NB), K-Nearest Neighbor (KNN) [48], Linear Discriminant Analysis (LDA), and Logistic Regression (LR) [49,50]. ...
Preprint
Full-text available
Breast cancer is one of the leading causes of death among women across the globe. It is difficult to treat if detected at advanced stages, however, early detection can significantly increase chances of survival and improves lives of millions of women. Given the widespread prevalence of breast cancer, it is of utmost importance for the research community to come up with the framework for early detection, classification and diagnosis. Artificial intelligence research community in coordination with medical practitioners are developing such frameworks to automate the task of detection. With the surge in research activities coupled with availability of large datasets and enhanced computational powers, it expected that AI framework results will help even more clinicians in making correct predictions. In this article, a novel framework for classification of breast cancer using mammograms is proposed. The proposed framework combines robust features extracted from novel Convolutional Neural Network (CNN) features with handcrafted features including HOG (Histogram of Oriented Gradients) and LBP (Local Binary Pattern). The obtained results on CBIS-DDSM dataset exceed state of the art.
... Deep learning (DL) is an ML initiative that can be used for automated training and selection from characteristics of breast cancer datasets [22]. Wisconsin Prognostic Breast Cancer Chemotherapy (WPBCC) and WDBC standards have been used in many studies over the years [23]. Many studies have been done on the diagnosis of breast cancer, and ML has been used to apply a variety of classification approaches, including Naive Bayes (NB), Decision Tree (DT), Logistic regression (LR), Random Forest (RF), Support Vector Machine (SVM), and others. ...
Article
Full-text available
In today’s healthcare setting, the accurate and timely diagnosis of breast cancer is critical for recovery and treatment in the early stages. In recent years, the Internet of Things (IoT) has experienced a transformation that allows the analysis of real-time and historical data using artificial intelligence (AI) and machine learning (ML) approaches. Medical IoT combines medical devices and AI applications with healthcare infrastructure to support medical diagnostics. The current state-of-the-art approach fails to diagnose breast cancer in its initial period, resulting in the death of most women. As a result, medical professionals and researchers are faced with a tremendous problem in early breast cancer detection. We propose a medical IoT-based diagnostic system that competently identifies malignant and benign people in an IoT environment to resolve the difficulty of identifying early-stage breast cancer. The artificial neural network (ANN) and convolutional neural network (CNN) with hyperparameter optimization are used for malignant vs. benign classification, while the Support Vector Machine (SVM) and Multilayer Perceptron (MLP) were utilized as baseline classifiers for comparison. Hyperparameters are important for machine learning algorithms since they directly control the behaviors of training algorithms and have a significant effect on the performance of machine learning models. We employ a particle swarm optimization (PSO) feature selection approach to select more satisfactory features from the breast cancer dataset to enhance the classification performance using MLP and SVM, while grid-based search was used to find the best combination of the hyperparameters of the CNN and ANN models. The Wisconsin Diagnostic Breast Cancer (WDBC) dataset was used to test the proposed approach. The proposed model got a classification accuracy of 98.5% using CNN, and 99.2% using ANN.
... Artificial Neural Networks (ANNs), Bayesian Networks (BNs), Support Vector Machines (SVMs), Decision Trees (DTs) are examples of these techniques. Even though it is clear that the application of machine learning algorithms can increase our understanding of cancer progression, adequate validation is required before these technologies can be used in clinical practice [3][4][5]. Lot of works have been carried out in diagnosing breast cancer with the help of machine learning techniques. Sun et al. presented a work on comparing feature selection approaches for unified breast cancer diagnosis in mammograms in 2005 [6]. ...
... Yassin et al. [110] in their review have outlined different conventional ML algorithms employed in recent past for the breast cancer diagnosis. These algorithms include but not limited to Decision Tree (DT), [111,112]. ...
Article
Full-text available
Breast cancer is one of the leading causes of death among women. Early detection of breast cancer can significantly improve the lives of millions of women across the globe. Given importance of finding solution/framework for early detection and diagnosis, recently many AI researchers are focusing to automate this task. The other reasons for surge in research activities in this direction are advent of robust AI algorithms (deep learning), availability of hardware that can run/train those robust and complex AI algorithms and accessibility of large enough dataset required for training AI algorithms. Different imaging modalities that have been exploited by researchers to automate the task of breast cancer detection are mammograms, ultrasound, magnetic resonance imaging, histopathological images or any combination of them. This article analyzes these imaging modalities and presents their strengths and limitations. It also enlists resources from where their datasets can be accessed for research purpose. This article then summarizes AI and computer vision based state-of-the-art methods proposed in the last decade to detect breast cancer using various imaging modalities. Primarily, in this article we have focused on reviewing frameworks that have reported results using mammograms as it is the most widely used breast imaging modality that serves as the first test that medical practitioners usually prescribe for the detection of breast cancer. Another reason for focusing on mammogram imaging modalities is the availability of its labelled datasets. Datasets availability is one of the most important aspects for the development of AI based frameworks as such algorithms are data hungry and generally quality of dataset affects performance of AI based algorithms. In a nutshell, this research article will act as a primary resource for the research community working in the field of automated breast imaging analysis.
... Yassin et al. [110] in their review have outlined different conventional ML algorithms employed in recent past for the breast cancer diagnosis. These algorithms include but not limited to Decision Tree (DT), [111,112]. ...
Preprint
Full-text available
In the last decade, researchers working in the domain of computer vision and Artificial Intelligence (AI) have beefed up their efforts to come up with the automated framework that not only detects but also identifies stage of breast cancer. The reason for this surge in research activities in this direction are mainly due to advent of robust AI algorithms (deep learning), availability of hardware that can train those robust and complex AI algorithms and accessibility of large enough dataset required for training AI algorithms. Different imaging modalities that have been exploited by researchers to automate the task of breast cancer detection are mammograms, ultrasound, magnetic resonance imaging, histopathological images or any combination of them. This article analyzes these imaging modalities and presents their strengths, limitations and enlists resources from where their datasets can be accessed for research purpose. This article then summarizes AI and computer vision based state-of-the-art methods proposed in the last decade, to detect breast cancer using various imaging modalities. Generally, in this article we have focused on to review frameworks that have reported results using mammograms as it is most widely used breast imaging modality that serves as first test that medical practitioners usually prescribe for the detection of breast cancer. Second reason of focusing on mammogram imaging modalities is the availability of its labeled datasets. Datasets availability is one of the most important aspect for the development of AI based frameworks as such algorithms are data hungry and generally quality of dataset affects performance of AI based algorithms. In a nutshell, this research article will act as a primary resource for the research community working in the field of automated breast imaging analysis.
... Logistic regression has been proven to be one of the most effective ML methods for cancer classification specifically when using BCWD [49]. This is due to its advantage when it comes to model regularization and feature correlation constraints [50]. ...
... The Scott-Knott (SK) test is a hierarchical clus-tering algorithm developed by Scott and Knott (1974), is an efficient method to conduct procedures of multiple comparisons without ambiguity (Lopes Bhering et al., 2008). Compared to other statistical tests such as the Tukey test, Student-Newman-Keuls (SNK) test and t-test, the SK test is a commonly used method (Bony et al., 2001;Calinski & Corsten, 1985;Cox & Spjøtvoll, 1982;Sharma et al., 2018), it has the ability to group techniques into non-ambiguous groups (Borges & Ferreira, 2003;Tsoumakas et al., 2005). In this study, the SK test was used to cluster the single and ensemble techniques based on their error rates ( Error rate = 1-Accuracy) and to check the significant difference between them. ...
Chapter
Full-text available
Breast Cancer (BC) is one of the most common forms of cancer among women. Detecting and accurately diagnosing breast cancer at an early phase increase the chances of women’s survival. For this purpose, various single classification techniques have been investigated to diagnosis BC. Nevertheless, none of them proved to be accurate in all circumstances. Recently, a promising approach called ensemble classifiers have been widely used to assist physicians accurately diagnose BC. Ensemble classifiers consist on combining a set of single classifiers by means of an aggregation layer. The literature in general shows that ensemble techniques outperformed single ones when ensemble members are accurate (i.e. have the lowest percentage error) and diverse (i.e. the single classifiers make uncorrelated errors on new instances). Hence, selecting ensemble members is often a crucial task since it can lead to the opposite: single techniques outperformed their ensemble. This paper evaluates and compares ensemble members’ selection based on accuracy and diversity with ensemble members’ selection based on accuracy only. A comparison with ensembles without member selection was also performed. Ensemble performance was assessed in terms of accuracy, F1-score. Q statistics diversity measure was used to calculate the classifiers diversity. The experiments were carried out on three well-known BC datasets available from online repositories. Seven single classifiers were used in our experiments. Skott Knott test and Borda Count voting system were used to assess the significance of the performance differences and rank ensembles according to theirs performances. The findings of this study suggest that: (1) Investigating both accuracy and diversity to select ensemble members often led to better performance, and (2) In general, selecting ensemble members using accuracy and/or diversity led to better ensemble performance than constructing ensembles without members’ selection.
... The Logistic Regression in this paper utilizes Tikhonov regularization [25][26][27][28] as a trading-off factor that tends to increase the likelihood and reduces the error rates. The reason for utilizing regularization is to inflict a penalty on the magnitude over the weights parameter w. ...
Article
There exists a problem in selecting the appropriate machine learning model for any given domain-specific data. Still, researchers are having issues over the model selection in solving the business problem. Along with model selection issues, researchers also face problems in the dataset. Provided all features separating important features and unimportant features in predicting the target class is a challenging task. This paper resolves these issues by using univariate data analysis through machine learning classification techniques as a basic analysis in the process of learning about the data. The objective of the paper is to perform a multi-class classification technique on different classes of mutation effects for the discussed genes. An advanced machine learning-based univariate analysis is performed on each dependent feature to get information about the data. In this paper, we proposed an optimized logistic regression technique using a stochastic gradient optimizer to perform the prediction of target classes. The model prediction is evaluated with a multiclass log loss metric.
... The some more datasets are available for analyzing the breast cancer. They are(1) Wisconsin Prognostic Breast-Cancer Chemotherapy (WPBCC) and (2)Wisconsin Diagnostic Breast-Cancer (WDBC) (Sharma et al. 2017). The larger number of ML algorithms are utilized to analyze the data set. ...
... A. Sharma et al. [9] experimented with the Wisconsin Prognostic dataset, for recurrence probability estimation, as well as the WBCD using different classifiers such as logistic regression, K-NN, and SVMs. Similar to some related work, the dataset was split into 70% and 30% for training and testing, respectively. ...
Conference Paper
Full-text available
Cancer, in general, is considered to be one of the highest causes of death worldwide. According to the Global Cancer statistics, breast cancer, which is the leading cause of death for women overall, is the second most diagnosed cancer with 11.6% of all positive cases. Whenever a lump of mass is found in the chest area, it would be diagnosed as either a cancerous or a non-cancerous tumor, which are also known as malignant or benign, respectively. Proper diagnosis is vital in order for the patient to start a treatment plan and recover as soon as possible. In this paper, we compare different Machine learning algorithms that are used to classify a patient's tumor using a set of features provided. Diagnostic Wisconsin Breast Cancer Dataset is used to train and test the different models which are then compared with each other using different classification metrics to identify the most robust and accurate models and compare against the state-of-the-art results.
... Its main advantage over other statistical cluster analysis approaches, such as Tukey and Student-Newman-Keuls (SNK) tests, is that it can group techniques into non-ambiguous groups (i.e., non-overlapping groups); [45,46]. SK is the most frequently used test among those designed for similar purposes [47][48][49][50]. SK technique was used in many studies on the selection of base techniques for ensembles [39,51,52] and/or clustering and ranking single techniques [53,54] In this study, the SK test was used to check the significant difference between ensembles techniques based on error rate. ...
Article
Breast cancer is one of the major causes of death among women. Different decision support systems were proposed to assist oncologists to accurately diagnose their patients. These decision support systems mainly used classification techniques to categorize the diagnosis into Malign or Benign tumors. Given that no consensus has been reached on the classifier that can perform best in all circumstances, ensemble-based classification, which classifies patients by combining more than one single classification technique, has recently been investigated. In this paper, heterogeneous ensembles based on three well-known machine learning techniques (support vector machines, multilayer perceptron, and decision trees) were developed and evaluated by investigating the impact of parameter values of the ensemble members on classification performance. In particular, we investigate three parameters tuning techniques: Grid Search (GS), Particle Swarm Optimization (PSO) and the default parameters of the Weka Tool to evaluate whether setting ensemble parameters permits more accurate classification in breast cancer over four datasets obtained from the Machine Learning repository. The heterogeneous ensembles of this study were built using the majority voting technique as a combination rule. The overall results obtained suggest that: (1) Using GS or PSO techniques for single techniques provide more accurate classification; (2) In general, ensembles generate more accurate classification than their single techniques regardless of the optimization techniques used. (3) Heterogeneous ensembles based on optimized single classifiers generate better results than the Uniform Configuration of Weka (UC-WEKA) ensembles, and (4) PSO and GS slightly have the same impact on the performances of ensembles.
... Sharma et al. [6] used three prediction techniques namely k-Nearest-Neighbours, Naive Bayes and Random Forest on the Wisconsin Breast Cancer data set and compared the results between the prediction models. Sharma et al. [15] used classifiers such as Logistic Regression, Support Vector Machine and k-Nearest Neighbours on the Wisconsin Breast Cancer dataset to predict whether the tumour is benign or malignant and compared the accuracy between the three classifiers. Osareh and Shadgar [10] have used principal component analysis for feature selection before using k-Nearest Neighbours, Support Vector Machine and Probabilistic Neural Networks to classify the tumour as malignant or benign. ...
Conference Paper
Full-text available
Breast cancer is a type of invasive cancer that occurs in women. Breast cancer accounts for 18% of all cancer related deaths among women according to World Health Organization. After Lung Cancer, breast cancer is the leading cause of death of women in India. Due to inaccessibility, especially in rural areas, it is impossible for everyone to get diagnosed in time. If breast cancer is detected at an early stage, the doctor will be aided in suggesting an efficient way to proceed with the treatment of the patient, thus reducing the mortality rate and medical expenses. So, in this paper a comparative study on machine learning and computational intelligence techniques has been performed to optimize the process and achieve better accuracy and precision. The focus of this review article is to survey several articles existing on breast cancer majorly on Wisconsin dataset which is obtained from UCI repository. This review article has been concluded with suggestions for future directions.
... These techniques provide a significant process for extracting the key features which can lead to a proper diagnosis. It is experimentally proven that machine learning and deep 61 learning algorithms are efficient when compared to conventional approaches[2]. ...
Chapter
According to global statistics, breast cancer is the second of all the fatal diseases that cause death. It will cause an adverse effect when left unnoticed for a long time. However, its early diagnosis provides significant treatment, thus improving the prognosis and the chance of survival. Therefore, accurate classification of the benign tumor is necessary in order to improve the living of the people. Thus, precision in the diagnosis of breast cancer has been a significant topic of research. Even though several new methodologies and techniques are proposed machine learning algorithms and artificial intelligence concepts lead to accurate diagnosis, consequently improving the survival rate of women. The major intent of this research work is to summarize various researches done on predicting breast cancer and classifying them using data mining techniques.
Chapter
Image processing has become an important tool in medical applications, with the ability to extract and analyze information from medical images. This chapter provides an overview of various image processing approaches used in medical applications, including deep learning algorithms, segmentation techniques, and a combination of both. The authors also discuss several studies on brain tumor detection, cancer detection, and X-ray analysis using image processing techniques. The studies demonstrate the potential of image processing techniques to significantly improve the accuracy and speed of disease detection, allowing for earlier diagnosis and treatment. Image processing techniques can also assist in treatment planning and lead to more informed diagnoses and treatment decisions. Continued research in this area will undoubtedly lead to even more advanced and sophisticated approaches to image processing, further enhancing the ability of healthcare professionals to diagnose and treat a wide range of medical conditions.
Article
Breast cancer (BC) is the most widely found disease among women in the world. The early detection of BC can frequently lessen the mortality rate as well as progress the probability of providing proper treatment. Hence, this paper focuses on devising the Exponential Honey Badger Optimization-based Deep Covolutional Neural Network (EHBO-based DCNN) for early identification of BC in the Internet of Things (IoT). Here, the Honey Badger Optimization (HBO) and Exponential Weighted Moving Average (EWMA) algorithms have been combined to create the EHBO. The EHBO is created to transfer the acquired medical data to the base station (BS) by choosing the best cluster heads to categorize the BC. Then, the statistical and texture features are extracted. Further, data augmentation is performed. Finally, the BC classification is done by DCNN. Thus, the observational outcome reveals that the EHBO-based DCNN algorithm attained outstanding performance concerning the testing accuracy, sensitivity, and specificity of 0.9051, 0.8971, and 0.9029, correspondingly. The accuracy of the proposed method is 7.23%, 6.62%, 5.39%, and 3.45% higher than the methods, such as multi-layer perceptron (MLP) classifier, deep learning, support vector machine (SVM), and ensemble-based classifier.
Article
Full-text available
Breast cancer is the commonest type of cancer in women worldwide and the leading cause of mortality for females. The aim of this research is to classify the alive and death status of breast cancer patients using the Surveillance, Epidemiology, and End Results dataset. Due to its capacity to handle enormous data sets systematically, machine learning and deep learning has been widely employed in biomedical research to answer diverse classification difficulties. Pre-processing the data enables its visualization and analysis for use in making important decisions. This research presents a feasible machine learning-based approach for categorizing SEER breast cancer dataset. Moreover, a two-step feature selection method based on Variance Threshold and Principal Component Analysis was employed to select the features from the SEER breast cancer dataset. After selecting the features, the classification of the breast cancer dataset is carried out using Supervised and Ensemble learning techniques such as Ada Boosting, XG Boosting, Gradient Boosting, Naive Bayes and Decision Tree. Utilizing the train-test split and k-fold cross-validation approaches, the performance of various machine learning algorithms is examined. The accuracy of Decision Tree for both train-test split and cross validation achieved as 98%. In this study, it is observed that the Decision Tree algorithm outperforms other supervised and ensemble learning approaches for the SEER Breast Cancer dataset.
Preprint
Full-text available
Currently, the second most devastating form of cancer in people, particularly in women, is Breast Cancer (BC). In the healthcare industry, Machine Learning (ML) is commonly employed in fatal disease prediction. Due to breast cancer's favorable prognosis at an early stage, a model is created to utilize the Dataset on Wisconsin Breast Cancer Dataset (WBCD). Conversely, this model's overarching axiom is to compare the effectiveness of five well-known ML classifiers, including Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), K-Nearest Neighbor (KNN), and Naive Bayes (NB) with the conventional method. To counterbalance the effect with conventional methods, the overarching tactic we utilized was hyperparameter tuning utilizing the grid search method, which improved accuracy, secondary precision, third recall, and finally the F1 score. In this study hyperparameter tuning model, the rate of accuracy increased from 94.15% to 98.83% whereas the accuracy of the conventional method increased from 93.56% to 97.08%. According to this investigation, KNN outperformed all other classifiers in terms of accuracy, achieving a score of 98.83%. In conclusion, our study shows that KNN works well with the hyper-tuning method. These analyses show that this study prediction approach is useful in prognosticating women with breast cancer with a viable performance and more accurate findings when compared to the conventional approach.
Thesis
Full-text available
The healthcare industry is one of the most information-intensive sectors. Medical facts expertise and information continue to expand every day. It was projected that five terabytes of data per year may be produced in an acute care hospital (Huang et al. 1996). Such data may be used to collect important healthcare information. The tremendous growth in the field of information technology software development and system integration technology has been incorporated to produce a new generation of the complex computer system. Information technology researchers are facing challenges to keep pace with these new eras of evolutions........
Chapter
Full-text available
Nowadays, Breast cancer has risen to become one of the most prominent causes of death in recent years. Among all malignancies, this is the most frequent and the major cause of death for women globally. Manually diagnosing this disease requires a good amount of time and expertise. Breast cancer detection is time-consuming, and the spread of the disease can be reduced by developing machine-based breast cancer predictions. In Machine learning, the system can learn from prior instances and find hard-to-detect patterns from noisy or complicated data sets using various statistical, probabilistic, and optimization approaches. This work compares several machine learning algorithms' classification accuracy, precision, sensitivity, and specificity on a newly collected dataset. In this work Decision tree, Random Forest, Logistic Regression, Naïve Bayes, and XGBoost, these five machine learning approaches have been implemented to get the best performance on our dataset. This study focuses on finding the best algorithm that can forecast breast cancer with maximum accuracy in terms of its classes. This work evaluated the quality of each algorithm's data classification in terms of efficiency and effectiveness. And also compared with other published work on this domain. After implementing the model, this study achieved the best model accuracy, 94% on Random Forest and XGBoost.
Article
This paper presents a comparative evaluation of classification algorithms using Waikato Environment for Knowledge Analysis (WEKA) software. The main goal of the paper is to conduct a comprehensive comparison and determine which predictive modelling technique is best for the problem of classifying breast cancer recurrence. The dataset for this study consists of 286 instances (201 instances belong to recurrence class and 85 instances belong to non-recurrence class) and 10 attributes. Comparison analysis is conducted for Naïve Bayes, J48, K*, Random Forest, Multilayer Perceptron (MLP) and Support Vector Machine (SVM) models using different parameters. The performance of the developed models is calculated using the following evaluation metrics: accuracy, precision, sensitivity, specificity, mean absolute error, ROC curves and AUC values. Contribution of the attributes to the classification models is assessed by measuring information gain. Results show that J48 model and the SVM algorithm give the highest accuracy, which is 75.5% and 79.6%, respectively. Implementation of SVM algorithm also shows the highest sensitivity of 99%, while the highest precision is obtained by MLP algorithm which is 79%. In addition, SVM algorithm possesses the lowest mean absolute error. Furthermore, by measuring information gain, it is revealed that a degree of malignant tumour contributes more than other attributes to recurrence of breast cancer.
Preprint
Full-text available
Nowadays, Breast cancer has risen to become one of the most prominent causes of death in recent years. Among all malignancies, this is the most frequent and the major cause of death for women globally. Manually diagnosing this disease requires a good amount of time and expertise. Breast cancer detection is time-consuming, and the spread of the disease can be reduced by developing machine-based breast cancer predictions. In Machine learning, the system can learn from prior instances and find hard-to-detect patterns from noisy or complicated data sets using various statistical, probabilistic, and optimization approaches. This work compares several machine learning algorithm's classification accuracy, precision, sensitivity, and specificity on a newly collected dataset. In this work Decision tree, Random Forest, Logistic Regression, Naive Bayes, and XGBoost, these five machine learning approaches have been implemented to get the best performance on our dataset. This study focuses on finding the best algorithm that can forecast breast cancer with maximum accuracy in terms of its classes. This work evaluated the quality of each algorithm's data classification in terms of efficiency and effectiveness. And also compared with other published work on this domain. After implementing the model, this study achieved the best model accuracy, 94% on Random Forest and XGBoost.
Chapter
Cancer is one of the leading causes of death in the world, which has increased over the past few years. This disease can be classified as benign or malignant. One of the first and most common cancers that appear in the human body is breast cancer, which, as the name implies, appears in the breast regardless of the person’s gender. Machine learning has been widely used to assist in the diagnosis of breast cancer. In this work, feature selection and multi-objective optimization are applied to the Breast Cancer Wisconsin Diagnostic dataset. It is intended to identify the most relevant characteristics to classify whether the diagnosis is benign or malignant. Two classifiers will be used in the feature selection task, one based on neural networks and the other on support vector machine. The objective functions to be used in the optimization process are to maximize sensitivity and specificity, simultaneously. A comparison was made between the techniques used and there was a better performance by neural networks.
Article
Breast cancer (BC) is the most commonly found disease among women all over the world. The early diagnosis of breast cancer can potentially reduce the mortality rate and increase the chances of a successful treatment. Paper focuses on proposing a methodology to conduct early diagnosis of breast cancer using the Internet of Things and Machine Learning. The main objective of the paper is to explore the machine learning techniques in predicting breast cancer with IoT devices.Proposed classifier resulted in 98%, 97%, 96% and 98% of precision, recall, F_Measure and accuracy, respectively. The minimum error rate for the classifier have also been determined and found to be 34.21%, 45.82%8, 64.47% of Mean Absolute Error (MAR),Root Mean Square Error (RMSE) and Relative Absolute Error (RAE), respectively. It was evident through the obtained results that the MLP classifier yields a higher accuracy with a minimum error rate when compared to LR and RF.
Chapter
This paper describes an analysis of Wisconsin diagnostic breast cancer (WDBC) and Wisconsin prognostic breast cancer (WPBC) datasets using machine learning algorithms (MLA). The datasets contain cell nucleus attributes collected through fine needle aspiration (FNA) of the suspicious breast tissue specimen. Features were extracted through digital image processing of the FNA collected specimen. The datasets were transformed into a lower dimension using principal component analysis followed by training of MLA such as support vector machine with multiple kernels like linear, polynomial, radial basis function and sigmoid, k-nearest neighbor, linear discriminant analysis, decision tree, logistic regression, and Gaussian Naive Bayes. The accuracy comparison between machine learning classifiers for different train/test data splits for WDBC and WPBC and also the cross-validation was performed to ensure more accurate results.
Article
Data mining (DM) consists in analysing a set of observations to find unsuspected relationships and then summarising the data in new ways that are both understandable and useful. It has become widely used in various medical fields including breast cancer (BC), which is the most common cancer and the leading cause of death among women worldwide. BC diagnosis is a challenging medical task and many studies have attempted to apply classification techniques to it. The objective of the present study is to identify studies on classification techniques in BC diagnosis and to analyse them from three perspectives: classification techniques used, accuracy of the classifiers, and comparison of performance. We performed a systematic literature review (SLR) of 176 selected studies published between January 2000 and November 2018. The results show that, of the nine classification techniques investigated, artificial neural networks, support vector machines and decision trees were the most frequently used. Moreover, artificial neural networks, support vector machines and ensemble classifiers performed better than the other techniques, with median accuracy values of 95%, 95% and 96% respectively. Most of the selected studies (57.4%) used datasets containing different types of images such as mammographic, ultrasound, and microarray images.
Chapter
In healthcare sector, cancer is one of the most threatening and fast-growing diseases. The early diagnosis of this disease is very important as the success rate of its treatment depends upon how early and accurately it is diagnosed. The machine learning algorithms are helpful in detection and prediction of diseases. To improve efficiency of these algorithms, optimal features need to be selected. So, this research work uses genetic algorithm to select optimal features before applying k-nearest neighbor (KNN) and weighted k-nearest neighbor (WKNN) on Wisconsin Breast Cancer Prognosis dataset extracted from UCI repository. This approach helps in early prediction and the results show that WKNN performed better with 86.44% accuracy than KNN which gives 83.05% accuracy.
Article
Breast cancer is one of the human threats which cause morbidity and mortality worldwide. The death rate can be reduced by advanced diagnosis. The objective of this article is to select the reduced number of features the help in diagnosing breast cancer in Wisconsin Diagnostic Breast Cancer (WDBC). This proposed model depicts women who all have no cancer cells or in benign stage later develop into malignant (metastases). Due to the dynamic nature of the big data framework, the proposed method ensures high confidence and low execution time. Moreover, healthcare information growth chases an exponential pattern, and current database systems cannot adequately manage the massive amount of data. So, it is requisite to adopt the “big data” solution for healthcare information.
Article
Full-text available
Breast cancer Dynamic magnetic resonance imaging (MRI) has emerged as a powerful diagnostic tool for breast cancer detection due to its high sensitivity and has established a role where findings from conventional mammogra-phy techniques are equivocal[1]. In the clinical setting, the ANN has been widely applied in breast cancer diagnosis using a subjective impression of different features based on defined criteria. In this study, feature selection and classification methods based on Artificial Neural Network (ANN) and Support Vector Machine (SVM) are applied to classify breast cancer on dynamic Magnetic Resonance Imaging (MRI). The database including benign and malignant lesions is specified to select the features and classify with proposed methods. It was collected from 2004 to 2006. A forward selection method is applied to find the best features for classification. Moreover, several neural networks classifiers like MLP, PNN, GRNN and RBF has been presented on a total of 112 histopathologically verified breast lesions to classify into benign and malignant groups. Also support vector machine have been considered as classifiers. Training and recalling classifiers are obtained with considering four-fold cross validation.
Article
Full-text available
Two medical applications of linear programming are described in this paper. Specifically, linearprogramming -based machine learning techniques are used to increase the accuracy and objectivity of breast cancer diagnosis and prognosis. The first application to breast cancer diagnosis utilizes characteristics of individual cells, obtained from a minimally invasive fine needle aspirate, to discriminate benign from malignant breast lumps. This allows an accurate diagnosis without the need for a surgical biopsy. The diagnostic system in current operation at University of Wisconsin Hospitals was trained on samples from 569 patients and has had 100% chronological correctness in diagnosing 131 subsequent patients. The second application, recently put into clinical practice, is a method that constructs a surface that predicts when breast cancer is likely to recur in patients that have had their cancers excised. This gives the physician and the patient better information with which to plan treat...
Book
Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.
Article
Classification of microcalcification clusters from mammograms plays essential roles in computer-aided diagnosis for early detection of breast cancer, where support vector machine (SVM) and artificial neural network (ANN) are two commonly used techniques. Although some work suggest that SVM performs better than ANN, the average accuracy achieved is only around 80% in terms of the area under the receiver operating characteristic curve Az. This performance may become much worse when the training samples are imbalanced. As a result, a new strategy namely balanced learning with optimized decision making is proposed to enable effective learning from imbalanced samples, which is further employed to evaluate the performance of ANN and SVM in this context. When the proposed learning strategy is applied to individual classifiers, the results on the DDSM database have demonstrated that the performance from both ANN and SVM has been significantly improved. Although ANN outperforms SVM when balanced learning is absent, the performance from the two classifiers becomes very comparable when both balanced learning and optimized decision making are employed. Consequently, an average improvement of more than 10% in the measurements of F1 score and Az measurement are achieved for the two classifiers. This has fully validated the effectiveness of our proposed method for the successful classification of clustered microcalcifications.
Book
This lively and engaging textbook explains the things you have to know in order to read empirical papers in the social and health sciences, as well as the techniques you need to build statistical models of your own. The author, David A. Freedman, explains the basic ideas of association and regression, and takes you through the current models that link these ideas to causality. The focus is on applications of linear models, including generalized least squares and two-stage least squares, with probits and logits for binary variables. The bootstrap is developed as a technique for estimating bias and computing standard errors. Careful attention is paid to the principles of statistical inference. There is background material on study design, bivariate regression, and matrix algebra. To develop technique, there are computer labs with sample computer programs. The book is rich in exercises, most with answers. Target audiences include advanced undergraduates and beginning graduate students in statistics, as well as students and professionals in the social and health sciences. The discussion in the book is organized around published studies, as are many of the exercises. Relevant journal articles are reprinted at the back of the book. Freedman makes a thorough appraisal of the statistical methods in these papers and in a variety of other examples. He illustrates the principles of modeling, and the pitfalls. The discussion shows you how to think about the critical issues–including the connection (or lack of it) between the statistical models and the real phenomena. Features of the book: • authoritative guidance from a well-known author with wide experience in teaching, research, and consulting • careful analysis of statistical issues in substantive applications • no-nonsense, direct style • versatile structure, enabling the text to be used as a text in a course, or read on its own • text that has been thoroughly class-tested at Berkeley • background material on regression and matrix algebra • plenty of exercises, most with solutions • extra material for instructors, including data sets and code for lab projects (available from Cambridge University Press) • many new exercises and examples • reorganized, restructured, and revised chapters to aid teaching and understanding.
Article
Correct diagnosis is one of the major problems in medical field. This includes the limitation of human expertise in diagnosing the disease manually. From the literature it has been found that pattern classification techniques such as support vector machines (SVM) and radial basis function neural network (RBFNN) can help them to improve in this domain. RBFNN and SVM with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. This paper compares the use of polynomial kernel of SVM and RBFNN in ascertaining the diagnostic accuracy of cytological data obtained from the Wisconsin breast cancer database. The data set includes nine different attributes and two categories of tumors namely benign and malignant. Known sets of cytologically proven tumor data was used to train the models to categorize cancer patients according to their diagnosis. Performance measures such as accuracy, specificity, sensitivity, F-score and other metrics used in medical diagnosis such as Youden’s index and discriminant power were evaluated to convey and compare the qualities of the classifiers. This research has demonstrated that RBFNN outperformed the polynomial kernel of SVM for correctly classifying the tumors.
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
The aim of this study is to design a classifier based expert system for early diagnosis of the organ in constraint phase to reach informed decision making without biopsy by using some selected features. The other purpose is to investigate a relationship between BMI (body mass index), smoking factor, and prostate cancer. The data used in this study were collected from 300 men (100: prostate adenocarcinoma, 200: chronic prostatism or benign prostatic hyperplasia). Weight, height, BMI, PSA (prostate specific antigen), Free PSA, age, prostate volume, density, smoking, systolic, diastolic, pulse, and Gleason score features were used and independent sample t-test was applied for feature selection. In order to classify related data, we have used following classifiers; scaled conjugate gradient (SCG), Broyden–Fletcher–Goldfarb–Shanno (BFGS), and Levenberg–Marquardt (LM) training algorithms of artificial neural networks (ANN) and linear, polynomial, and radial based kernel functions of support vector machine (SVM). It was determined that smoking is a factor increases the prostate cancer risk whereas BMI is not affected the prostate cancer. Since PSA, volume, density, and smoking features were to be statistically significant, they were chosen for classification. The proposed system was designed with polynomial based kernel function, which had the best performance (accuracy: 79%). In Turkish Family Health System, family physician to whom patients are applied firstly, would contribute to extract the risk map of illness and direct patients to correct treatments by using expert system such proposed.
Article
Multisurface pattern separation is a mathematical method for distinguishing between elements of two pattern sets. Each element of the pattern sets is comprised of various scalar observations. In this paper, we use the diagnosis of breast cytology to demonstrate the applicability of this method to medical diagnosis and decision making. Each of 11 cytological characteristics of breast fine-needle aspirates reported to differ between benign and malignant samples was graded 1 to 10 at the time of sample collection. Nine characteristics were found to differ significantly between benign and malignant samples. Mathematically, these values for each sample were represented by a point in a nine-dimensional space of real variables. Benign points were separated from malignant ones by planes determined by linear programming. Correct separation was accomplished in 369 of 370 samples (201 benign and 169 malignant). In the one misclassified malignant case, the fine-needle aspirate cytology was so definitely benign and the cytology of the excised cancer so definitely malignant that we believe the tumor was missed on aspiration. Our mathematical method is applicable to other medical diagnostic and decision-making problems.
Article
INTRODUCTION We consider the two point-sets A and B in the n-dimensional real space R n represented by the m Theta n matrix A and the k Theta n matrix B respectively. Our principal objective here is to formulate a single linear program with the following properties: (i) If the convex hulls of A and B are disjoint, a strictly separating plane is obtained. (ii) If the convex hulls of A and B intersect, a plane is obtained that minimizes some measure of misclassification points, for all possible cases. (iii) No extraneous constraints are imposed on the linear program that rule out any specific case from consideration. Most linear programming formulations 6,5,12,4 have property (i), however
Predictive Machine Learning Techniques for Breast Cancer Detection
  • S Kharya
  • D Dubey
  • S Soni
Breast Cancer analysis using Logistic Regression
  • H Yusuff
  • N Mohamad
  • U K Ngah
  • A S Yahaya
An introduction to kernel and nearest-neighbor nonparametric regression
  • N S Altman