Conference Paper

On the Use of Estimated Tumor Marker Classifications in Tumor Diagnosis Prediction - A Case Study for Breast Cancer

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this article, we describe the use of tumour marker estimation models in the prediction of tumour diagnoses. In previous works, we have identified classification models for tumour markers that can be used for estimating tumour marker values on the basis of standard blood parameters. These virtual tumour markers are now used in combination with standard blood parameters for learning classifiers that are used for predicting tumour diagnoses. Several data-based modelling approaches implemented in HeuristicLab have been applied for identifying estimators for selected tumour markers and cancer diagnoses: linear regression, k-nearest neighbour (k-NN) learning, artificial neural networks (ANNs) and support vector machines (SVMs) (all optimised using evolutionary algorithms), as well as genetic programming (GP). We have applied these modelling approaches for identifying models for breast cancer diagnoses; in the results section, we summarise classification accuracies for breast cancer and we compare classification results achieved by models that use measured marker values as well as models that use virtual tumour markers.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Chapter
In this chapter we present results of empirical research work done on the data based identification of estimation models for tumor markers and cancer diagnoses: Based on patients’ data records including standard blood parameters, tumor markers, and information about the diagnosis of tumors we have trained mathematical models that represent virtual tumor markers and predictors for cancer diagnoses, respectively. We have used a medical database compiled at the Central Laboratory of the General Hospital Linz, Austria, and applied several data based modeling approaches for identifying mathematical models for estimating selected tumor marker values on the basis of routinely available blood values; in detail, estimators for the tumor markers AFP, CA-125, CA15-3, CEA, CYFRA, and PSA have been identified and are discussed here. Furthermore, several data based modeling approaches implemented in HeuristicLab have been applied for identifying estimators for selected cancer diagnoses: Linear regression, k-nearest neighbor learning, artificial neural networks, and support vector machines (all optimized using evolutionary algorithms) as well as genetic programming. The investigated diagnoses of breast cancer, melanoma, and respiratory system cancer can be estimated correctly in up to 81%, 74%, and 91% of the analyzed test cases, respectively; without tumor markers up to 75%, 74%, and 87% of the test samples are correctly estimated, respectively.
Conference Paper
Standard patient parameters, tumor markers, and tumor diagnosis records are used for identifying prediction models for tumor markers as well as cancer diagnosis predictions. In this paper we present a hybrid clustering and classification approach that first identifies data clusters (using standard patient data and tumor markers) and then learns prediction models on the basis of these data clusters. The so formed clusters are analyzed and their homogeneity is calculated; the models learned on the basis of these clusters are tested and compared to each other with respect to classification accuracy and variable impacts.
Article
This paper discusses a novel approach for the prediction of breast cancer, melanoma and cancer in the respiratory system using ensemble modeling techniques. For each type of cancer, a set of unequally complex predictors are learned by symbolic classification based on genetic programming. In addition to standard ensemble modeling, where the prediction is based on a majority voting of the prediction models, two confidence parameters are used which aim to quantify the trustworthiness of each single prediction based on the clearness of the majority voting. Based on the calculated confidence of each ensemble prediction, predictions might be considered uncertain. The experimental part of this paper discusses the increase of accuracy that can be obtained for those samples which are considered trustable depending on the ratio of predictions that are considered trustable.
Article
In this paper we describe the identification of variable interaction networks in a medical data set. The main goal is to generate mathematical models for standard blood parameters as well as tumor markers using other available parameters in this data set. For each variable we identify those variables that are most relevant for modeling it; relevance of a variable can in this context be defined via the frequency of its occurrence in models identified by evolutionary machine learning methods or via the decrease in modeling quality after removing it from the data set. Several data based modeling approaches implemented in HeuristicLab have been applied for identifying estimators for selected tumor markers and cancer diagnoses: Linear regression and support vector machines (optimized using evolutionary algorithms) as well as genetic programming.
Article
This paper presents a simulation study carried out within a private healthcare facility with the aim of understanding whether or not it is able to handle a greater flow of incoming patients as well as the related impact on the overall efficiency. As a result, the simulation outcomes have pointed out the need for an internal work re-organization that has been devised through Lean Management tools and methodologies. The simulation model has, then, been used to predict the intended changes effects as well as their feasibility. Particular attention has been paid on the care administration process, provided that research activities are still ongoing to investigate other processes in the patient value chain where there is still substantial room for improvement. The proposed research work is grounded on an in dept analysis of the main processes and activities taking place in the healthcare facility as a starting point for the simulation model development. Afterwards, simulation has been used for “as-is” analyses and, in combination with Lean Management approaches, for “what-if” studies whose results and findings are discussed.
Article
Full-text available
Sensitivity and specificity of using individual tumor markers hardly meet the clinical requirement. This challenge gave rise to many efforts, e.g., combing multiple tumor markers and employing machine learning algorithms. However, results from different studies are often inconsistent, which are partially attributed to the use of different evaluation criteria. Also, the wide use of model-dependent validation leads to high possibility of data overfitting when complex models are used for diagnosis. We propose two model-independent criteria, namely, area under the curve (AUC) and Relief to evaluate the diagnostic values of individual and multiple tumor markers, respectively. For diagnostic decision support, we propose the use of logistic-tree which combines decision tree and logistic regression. Application on a colorectal cancer dataset shows that the proposed evaluation criteria produce results that are consistent with current knowledge. Furthermore, the simple and highly interpretable logistic-tree has diagnostic performance that is competitive with other complex models.
Conference Paper
Tumor markers are substances that are found in blood, urine, or body tissues and that are used as indicators for tumors; elevated tumor marker values can indicate the presence of cancer, but there can also be other causes. We have used a medical database compiled at the blood laboratory of the General Hospital Linz, Austria: Several blood values of thousands of patients are available as well as several tumor markers. We have used several data based modeling approaches for identifying mathematical models for estimating selected tumor marker values on the basis of routinely available blood values; in detail, estimators for the tumor markers AFP, CA-125, CA15-3, CEA, CYFRA, and PSA have been identified and are analyzed in this paper. The documented tumor marker values are classified as "normal" or "elevated"; our goal is to design classifiers for the respective binary classification problems. As we show in the results section, for those medical modeling tasks described here, genetic programming performs best among those techniques that are able to identify nonlinearities; we also see that GP results show less overfitting than those produced using other methods.
Article
A review of the status of standardization of laboratory tests of particular interest to oncologists is presented. Currently, relatively few of these tests are standardized; as a result, interlaboratory and interinstitutional comparison of data is problematic. In 1992, additional interlaboratory studies of common tumor markers will be initiated by the College of American Pathologists. The National Committee for Clinical Laboratory Standards also has begun to develop standard methods and guidelines for these important tests.
Article
To evaluate the usefulness of tumor-marker measurements and to identify prognostic factors in patients with cancer of unknown primary (CUP), receiving platinum-based combination chemotherapy and to verify the adjustment of previously reported prognostic models in this population. We conducted univariate and multivariate analyses in consecutive patients with CUP receiving platinum-based combination chemotherapy. Previously reported prognostic models were then validated in this population. A total of 93 patients were analyzed and the response rate to platinum-based chemotherapeutic regimens among the 93 patients was 39.8%. The median time to progression and overall survival period were 4.1 and 12.4 months, respectively. The ST-439 level was significantly higher in patients with histologically confirmed adenocarcinoma than in patients with poorly differentiated adenocarcinoma or poorly differentiated carcinoma. A multivariate analysis indicated that performance status, the number of involved organs, and the serum lactate dehydrogenase level were the prognostic factors of the outcome. Both the previously reported prognostic models for predicting the duration of survival in this population were shown to be valid. Tumor-marker measurements are not helpful in the management of patients with CUP. Previously reported prognostic models may be useful for selecting indication for chemotherapy or for stratifying the patients in clinical trial.