Chapter

On the Identification of Virtual Tumor Markers and Tumor Diagnosis Predictors Using Evolutionary Algorithms

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this chapter we present results of empirical research work done on the data based identification of estimation models for tumor markers and cancer diagnoses: Based on patients’ data records including standard blood parameters, tumor markers, and information about the diagnosis of tumors we have trained mathematical models that represent virtual tumor markers and predictors for cancer diagnoses, respectively. We have used a medical database compiled at the Central Laboratory of the General Hospital Linz, Austria, and applied several data based modeling approaches for identifying mathematical models for estimating selected tumor marker values on the basis of routinely available blood values; in detail, estimators for the tumor markers AFP, CA-125, CA15-3, CEA, CYFRA, and PSA have been identified and are discussed here. Furthermore, several data based modeling approaches implemented in HeuristicLab have been applied for identifying estimators for selected cancer diagnoses: Linear regression, k-nearest neighbor learning, artificial neural networks, and support vector machines (all optimized using evolutionary algorithms) as well as genetic programming. The investigated diagnoses of breast cancer, melanoma, and respiratory system cancer can be estimated correctly in up to 81%, 74%, and 91% of the analyzed test cases, respectively; without tumor markers up to 75%, 74%, and 87% of the test samples are correctly estimated, respectively.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Battiti's mutual information feature selector (MIFS) and its variant algorithms are used for many classification applications. Since they ignore feature synergy, MIFS and its variants may cause a big bias when features are combined to cooperate together. Besides, MIFS and its variants estimate feature redundancy regardless of the corresponding classification task. In this paper, we propose an automated greedy feature selection algorithm called conditional mutual information-based feature selection (CMIFS). Based on the link between interaction information and conditional mutual information, CMIFS takes account of both redundancy and synergy interactions of features and identifies discriminative features. In addition, CMIFS combines feature redundancy evaluation with classification tasks. It can decrease the probability of mistaking important features as redundant features in searching process. The experimental results show that CMIFS can achieve higher best-classification-accuracy than MIFS and its variants, with the same or less (nearly 50%) number of features.
Article
Full-text available
This paper presents a new generic Evolutionary Algorithm (EA) for retarding the unwanted effects of premature convergence. This is accomplished by a combination of interacting generic methods. These generalizations of a Genetic Algorithm (GA) are inspired by population genetics and take advantage of the interactions between genetic drift and migration. In this regard a new selection scheme is introduced, which is designed to directedly control genetic drift within the population by advantageous self-adaptive selection pressure steering. Additionally this new selection model enables a quite intuitive heuristics to detect premature convergence. Based upon this newly postulated basic principle the new selection mechanism is combined with the already proposed Segregative Genetic Algorithm (SEGA), an advanced Genetic Algorithm (GA) that introduces parallelism mainly to improve global solution quality. As a whole, a new generic evolutionary algorithm (SASEGASA) is introduced. The performance of the algorithm is evaluated on a set of characteristic benchmark problems. Computational results show that the new method is capable of producing highest quality solutions without any problem-specific additions.
Conference Paper
Full-text available
We review accuracy estimation methods and compare the two most common methods: crossvalidation and bootstrap. Recent experimental results on arti cial data and theoretical results in restricted settings have shown that for selecting a good classi er from a set of classiers (model selection), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment|over half a million runs of C4.5 and a Naive-Bayes algorithm|to estimate the e ects of di erent parameters on these algorithms on real-world datasets. For crossvalidation, we vary the number of folds and whether the folds are strati ed or not � for bootstrap, we vary the number of bootstrap samples. Our results indicate that for real-word datasets similar to ours, the best method to use for model selection is ten-fold strati ed cross validation, even if computation power allows using more folds. 1
Article
Full-text available
Summary Feature selection aims to reduce the dimensionality of patterns for classificatory analysis by selecting the most informative instead of irrelevant and/or redundant features. In this paper we propose a novel feature selection measure based on mutual information and takes into consideration the interaction between features. The proposed measure is used to determine relevant features from the original feature set for a pattern recognition problem. We use a Support Vector Machine (SVM) classifier to compare the performance of our measure with recently proposed information theoretic criteria. Very good performances are obtained when applying this method on handwritten digital recognition data.
Conference Paper
Full-text available
Selection for reproduction in the context of Genetic Algorithms uses only one selection scheme to select parent individuals. When considering the model of sexual selection in the area of population genetics it gets obvious that the process of choosing mating partners in natural populations is difierent for male and female individuals. In this paper the authors introduce a new selection paradigm for Genetic Algorithms (SexualGA) based upon the concepts of male vigor and female choice of population genetics which provides the possibility to use two difierent selection schemes simultaneously within one algorithm. By using this new concept it is possible to simulate sexual selection in natural populations more precisely. Furthermore, SexualGA also ofiers far more ∞exibility concerning the adaptivity of selection pressure enabling the GA user to tune the algorithm more accurately.
Chapter
Full-text available
In terms of goal orientedness, selection is the driving force of Genetic Algorithms (GAs). In contrast to crossover and mutation, selection is completely generic, i.e. independent of the actually employed problem and its representation. GA-selection is usually implemented as selection for reproduction (parent selection). In this paper we propose a second selection step after reproduction which is also absolutely problem independent. This self-adaptive selection mechanism, which will be referred to as offspring selection, is closely related to the general selection model of population genetics. As the problem- and representation-specific implementation of reproduction in GAs (crossover) is often critical in terms of preservation of essential genetic information, offspring selection has proven to be very suited for improving the global solution quality and robustness concerning parameter settings and operators of GAs in various fields of applications. The experimental part of the paper discusses the potential of the new selection model exemplarily on the basis of standardized real-valued test functions in high dimensions.
Chapter
Full-text available
Genetic Programming has been slow at realizing other programming paradigms than conventional, deterministic, sequential von-Neumann type algorithms. In this contribution we discuss a new method of execution of programs introduced recently: Algorithmic Chemistries. Therein, register machine instructions are executed in a non-deterministic order, following a probability distribution. Program behavior is thus highly dependent on frequency of instructions and connectivity between registers. Here we demonstrate the performance of GP on evolving solutions to a parity problem in a system of this type.
Conference Paper
Full-text available
The paper presents an original filter approach for eective feature selection in classification tasks with a very large number of input variables. The approach is based on the use of a new information theo- retic selection criterion: the double input symmetrical relevance (DISR). The rationale of the criterion is that a set of variables can return an information on the output class that is higher than the sum of the infor- mations of each variable taken individually. This property will be made explicit by defining the measure of variable complementarity. A feature selection filter based on the DISR criterion is compared in theoretical and experimental terms to recently proposed information theoretic cri- teria. Experimental results on a set of eleven microarray classification tasks show that the proposed technique is competitive with existing fil- ter selection methods.
Conference Paper
Full-text available
This contribution proposes an enhanced and generic selection model for Genetic Algorithms (GAs) and Genetic Programming (GP) which is able to preserve the alleles which are part of a high quality solution. Some selected aspects of these enhanced techniques are discussed exemplarily on the basis of standardized benchmark problems.
Conference Paper
Full-text available
In this paper we present results of empirical research work done on the data based identification of estimation models for cancer diagnoses: Based on patients' data records including standard blood parameters, tumor markers, and information about the diagnosis of tumors we have trained mathematical models for estimating cancer diagnoses. Several data based modeling approaches implemented in HeuristicLab have been applied for identifying estimators for selected cancer diagnoses: Linear regression, k-nearest neighbor learning, artificial neural networks, and support vector machines (all optimized using evolutionary algorithms) as well as genetic programming. The investigated diagnoses of breast cancer, melanoma, and respiratory system cancer can be estimated correctly in up to 81%, 74%, and 91% of the analyzed test cases, respectively; without tumor markers up to 75%, 74%, and 87% of the test samples are correctly estimated, respectively.
Book
Full-text available
This is one of the only books to provide a complete and coherent review of the theory of genetic programming (GP). In doing so, it provides a coherent consolidation of recent work on the theoretical foundations of GP. A concise introduction to GP and genetic algorithms (GA) is followed by a discussion of fitness landscapes and other theoretical approaches to natural and artificial evolution. Having surveyed early approaches to GP theory it presents new exact schema analysis, showing that it applies to GP as well as to the simpler GAs. New results on the potentially infinite number of possible programs are followed by two chapters applying these new techniques.
Book
Full-text available
Genetic Algorithms and Genetic Programming: Modern Concepts and Practical Applications discusses algorithmic developments in the context of genetic algorithms (GAs) and genetic programming (GP). It applies the algorithms to significant combinatorial optimization problems and describes structure identification using HeuristicLab as a platform for algorithm development. The book focuses on both theoretical and empirical aspects. The theoretical sections explore the important and characteristic properties of the basic GA as well as main characteristics of the selected algorithmic extensions developed by the authors. In the empirical parts of the text, the authors apply GAs to two combinatorial optimization problems: the traveling salesman and capacitated vehicle routing problems. To highlight the properties of the algorithmic measures in the field of GP, they analyze GP-based nonlinear structure identification applied to time series and classification problems. Written by core members of the HeuristicLab team, this book provides a better understanding of the basic workflow of GAs and GP, encouraging readers to establish new bionic, problem-independent theoretical concepts. By comparing the results of standard GA and GP implementation with several algorithmic extensions, it also shows how to substantially increase achievable solution quality.
Article
Full-text available
The effect of screening with prostate-specific-antigen (PSA) testing and digital rectal examination on the rate of death from prostate cancer is unknown. This is the first report from the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial on prostate-cancer mortality. From 1993 through 2001, we randomly assigned 76,693 men at 10 U.S. study centers to receive either annual screening (38,343 subjects) or usual care as the control (38,350 subjects). Men in the screening group were offered annual PSA testing for 6 years and digital rectal examination for 4 years. The subjects and health care providers received the results and decided on the type of follow-up evaluation. Usual care sometimes included screening, as some organizations have recommended. The numbers of all cancers and deaths and causes of death were ascertained. In the screening group, rates of compliance were 85% for PSA testing and 86% for digital rectal examination. Rates of screening in the control group increased from 40% in the first year to 52% in the sixth year for PSA testing and ranged from 41 to 46% for digital rectal examination. After 7 years of follow-up, the incidence of prostate cancer per 10,000 person-years was 116 (2820 cancers) in the screening group and 95 (2322 cancers) in the control group (rate ratio, 1.22; 95% confidence interval [CI], 1.16 to 1.29). The incidence of death per 10,000 person-years was 2.0 (50 deaths) in the screening group and 1.7 (44 deaths) in the control group (rate ratio, 1.13; 95% CI, 0.75 to 1.70). The data at 10 years were 67% complete and consistent with these overall findings. After 7 to 10 years of follow-up, the rate of death from prostate cancer was very low and did not differ significantly between the two study groups. (ClinicalTrials.gov number, NCT00002540.)
Article
Full-text available
The present approach to cancer treatment is often referred to as "trial and error" or "one size fits all." This practice is inefficient and frequently results in inappropriate therapy and treatment-related toxicity. In contrast, personalized treatment has the potential to increase efficacy and decrease toxicity. We reviewed the literature relevant to prognostic, predictive, and toxicity-related markers in cancer, with particular attention to systematic reviews, prospective randomized trials, and guidelines issued by expert panels. To achieve personalized treatment for cancer, we need markers for determining prognosis, predicting response to therapy, and predicting severe toxicity related to treatment. Among the best-validated prognostic markers currently available are serum concentrations of alpha-fetoprotein (AFP), human chorionic gonadotropin (hCG), and lactate dehydrogenase (LDH) for patients with nonseminoma germ cell tumors and tissue concentrations of both urokinase plasminogen activator and plasminogen activator inhibitor 1 (PAI-1) for breast cancer patients. Clinically useful therapy predictive markers are estrogen and progesterone receptors to select patients with breast cancer for treatment with endocrine therapy and human epidermal growth factor receptor 2 (HER-2) to select breast cancer patients for treatment with trastuzumab (Herceptin). Markers available for identifying drug-induced adverse reactions include thiopurine methyltransferase (TPMT) to predict toxicity from thiopurines in the treatment of acute lymphoblastic leukemia and uridine diphosphate glucuronyltransferase to predict toxicity from irinotecan in the treatment of colorectal cancer. Validated prognostic, predictive, and toxicity markers should help cancer treatment move from the current trial-and-error approach to more personalized treatment.
Article
Full-text available
The aim was to investigate the diagnostic utility of CYFRA 21-1 (cytokeratin 19 fragment) as a tumor marker in pleural effusion and evaluate the value of combining CYFRA 21-1 and carcinoembryonic antigen (CEA) assays as a diagnostic aid in the malignant pleural effusion. One hundred and twenty-six patients (72 malignant and 54 benign pleural effusion) were included in this retrospective study. The effusion levels of CYFRA 21-1 and CEA were measured using radioimmunometric assay. The median values of CYFRA 21-1 in benign and malignant pleural effusion are 15 and 70 ng/ml, respectively. Using a cut-off value of 50 ng/ml, defined at 94% specificity, the diagnostic sensitivity of CYFRA 21-1 for non-small cell lung carcinoma (n = 61), squamous cell carcinoma (n = 21), adenocarcinoma (n = 40) and small cell lung cancer (n = 11) was 64, 71, 60 and 18%, respectively. Regardless of cell types, the diagnostic sensitivity of CYFRA 21-1 and CEA in malignant pleural effusion (n = 72) was 57 and 60%, respectively (cut-off value of 10 ng/ml in CEA assay). Combining CEA with CYFRA 21-1, the diagnostic sensitivity may increase up to 72%, which was defined at 89% specificity. CYFRA 21-1 assay may be a useful tumor marker for discriminating benign from malignant pleural effusion, especially in those of non-small cell lung cancer. The combined use of CEA and CYFRA 21-1 assay in the malignant effusion may increase the diagnostic yield compared with CEA or CYFRA 21-1 alone.
Article
Full-text available
Mammalian alpha-fetoprotein (AFP) is classified as a member of the albuminoid gene superfamily consisting of albumin, AFP, vitamin D (Gc) protein, and alpha-albumin. Molecular variants of AFP have long been reported in the biomedical literature. Early studies identified isoelectric pH isoforms and lectin-binding variants of AFP, which differed in their physicochemical properties, but not in amino acid composition. Genetic variants of AFP, differing in mRNA kilobase length, were later extensively described in rodent models during fetal/perinatal stages, carcinogenesis, and organ regeneration. With the advent of monoclonal antibodies in the early 1980s, multiple antigenic epitopes on native AFP were detected and categorized, culminating in the identification of six to seven major epitopes. During this period, various AFP-binding proteins and receptors were reported to inhibit certain AFP immunoreactions. Concomittantly, human and rodent AFP were cloned and the amino acid sequences of the translated proteins were divulged. Once the amino acid composition of the AFP molecule was known, enzymatic fragments could be identified and synthetic peptide segments synthesized. Following discovery of the molten globule form in 1981, the existence of transitory, intermediate forms of AFP were acknowledged and their physiological significance was realized. In the present review, the various isoforms and variants of AFP are discussed in light of their potential biological relevance.
Article
Full-text available
Two methods were used to demonstrate the presence of tumor-specific antigens in adenocarcinomata of the human colon: (a) rabbits were immunized with extracts of pooled colonic carcinomata, and the antitumor antisera thus produced were absorbed with a pooled extract of normal human colon and with human blood components; (b) newborn rabbits were made immunologically tolerant to normal colonic tissue at birth, and were then immunized with pooled tumor material in adult life. Normal and tumor tissues were obtained from the same human donors in order to avoid misinterpretation of results due to individual-specific antigenic differences. The antisera prepared by both methods were tested against normal and tumor antigens by the techniques of agar gel diffusion, immunoelectrophoresis, hemagglutination, PCA, and immunofluorescence. Distinct antibody activity directed against at least two qualitatively tumor-specific antigens, or antigenic determinants, was detected in the antisera prepared by both methods and at least two additional tumor antigens were detected exclusively in antisera prepared by the tolerance technique. Whether these additional antigens were qualitatively different from normal tissue antigens, or merely present in tumor tissue in higher concentrations than in normal tissue has not as yet been determined. Furthermore, it was shown that the tumor-specific antibodies were not directed against bacterial contaminants or against the unusually high concentrations of fibrin found in many neoplastic tissues. It was concluded from these results that the pooled tumor extracts contained tumor-specific antigens not present in normal colonic tissue. Identical tumor-specific antigens were also demonstrated in a number of individual colonic carcinomata obtained from different human donors.
Article
Full-text available
The optimal upper limit of the normal range for prostate-specific antigen (PSA) is unknown. We investigated the prevalence of prostate cancer among men in the Prostate Cancer Prevention Trial who had a PSA level of 4.0 ng per milliliter or less. Of 18,882 men enrolled in the prevention trial, 9459 were randomly assigned to receive placebo and had an annual measurement of PSA and a digital rectal examination. Among these 9459 men, 2950 men never had a PSA level of more than 4.0 ng per milliliter or an abnormal digital rectal examination, had a final PSA determination, and underwent a prostate biopsy after being in the study for seven years. Among the 2950 men (age range, 62 to 91 years), prostate cancer was diagnosed in 449 (15.2 percent); 67 of these 449 cancers (14.9 percent) had a Gleason score of 7 or higher. The prevalence of prostate cancer was 6.6 percent among men with a PSA level of up to 0.5 ng per milliliter, 10.1 percent among those with values of 0.6 to 1.0 ng per milliliter, 17.0 percent among those with values of 1.1 to 2.0 ng per milliliter, 23.9 percent among those with values of 2.1 to 3.0 ng per milliliter, and 26.9 percent among those with values of 3.1 to 4.0 ng per milliliter. The prevalence of high-grade cancers increased from 12.5 percent of cancers associated with a PSA level of 0.5 ng per milliliter or less to 25.0 percent of cancers associated with a PSA level of 3.1 to 4.0 ng per milliliter. Biopsy-detected prostate cancer, including high-grade cancers, is not rare among men with PSA levels of 4.0 ng per milliliter or less--levels generally thought to be in the normal range.
Article
Full-text available
We evaluated the ability of CA15-3 and alkaline phosphatase (ALP) to predict breast cancer recurrence. Data from seven International Breast Cancer Study Group trials were combined. The primary end point was relapse-free survival (RFS) (time from randomization to first breast cancer recurrence), and analyses included 3953 patients with one or more CA15-3 and ALP measurement during their RFS period. CA15-3 was considered abnormal if >30 U/ml or >50% higher than the first value recorded; ALP was recorded as normal, abnormal, or equivocal. Cox proportional hazards models with a time-varying indicator for abnormal CA15-3 and/or ALP were utilized. Overall, 784 patients (20%) had a recurrence, before which 274 (35%) had one or more abnormal CA15-3 and 35 (4%) had one or more abnormal ALP. Risk of recurrence increased by 30% for patients with abnormal CA15-3 [hazard ratio (HR) = 1.30; P = 0.0005], and by 4% for those with abnormal ALP (HR = 1.04; P = 0.82). Recurrence risk was greatest for patients with either (HR = 2.40; P < 0.0001) and with both (HR = 4.69; P < 0.0001) biomarkers abnormal. ALP better predicted liver recurrence. CA15-3 was better able to predict breast cancer recurrence than ALP, but use of both biomarkers together provided a better early indicator of recurrence. Whether routine use of these biomarkers improves overall survival remains an open question.
Article
Context.—Current tumor markers for ovarian cancer still lack adequate sensitivity and specificity to be applicable in large populations. High-throughput proteomic profiling and bioinformatics tools allow for the rapid screening of a large number of potential biomarkers in serum, plasma, or other body fluids. Objective.—To determine whether protein profiles of plasma can be used to identify potential biomarkers that improve the detection of ovarian cancer. Design.—We analyzed plasma samples that had been collected between 1998 and 2001 from patients with sporadic ovarian serous neoplasms before tumor resection at various International Federation of Gynecology and Obstetrics stages (stage I [n = 11], stage II [n = 3], and stage III [n = 29]) and from women without known neoplastic disease (n = 38) using proteomic profiling and bioinformatics. We compared results between the patients with and without cancer and evaluated their discriminatory performance against that of the cancer antigen 125 (CA125) tumor marker. Results.—We selected 7 biomarkers based on their collective contribution to the separation of the 2 patient groups. Among them, we further purified and subsequently identified 3 biomarkers. Individually, the biomarkers did not perform better than CA125. However, a combination of 4 of the biomarkers significantly improved performance (P ≤ .001). The new biomarkers were complementary to CA125. At a fixed specificity of 94%, an index combining 2 of the biomarkers and CA125 achieves a sensitivity of 94% (95% confidence interval, 85%–100.0%) in contrast to a sensitivity of 81% (95% confidence interval, 68%–95%) for CA125 alone. Conclusions.—The combined use of bioinformatics tools and proteomic profiling provides an effective approach to screen for potential tumor markers. Comparison of plasma profiles from patients with and without known ovarian cancer uncovered a panel of potential biomarkers for detection of ovarian cancer with discriminatory power complementary to that of CA125. Additional studies are required to further validate these biomarkers.
Article
Background: When ovarian carcinoma is diagnosed in stage I, up to 90% of patients can be cured with surgery and currently available chemotherapy. At present, less than 25% of cases are diagnosed at this stage. To increase the fraction of ovarian cancers detected at an early stage, screening strategies have been devised that utilize a rising serum CA125 level to trigger the performance of transvaginal sonography. One limitation of CA125 as an initial step in such a screening strategy is that up to 20% of ovarian cancers lack expression of the antigen. Serum tumor markers that can be detected in ovarian cancers that lack CA125 expression might improve the sensitivity for early detection. Methods: From 296 ovarian cancers, 65 (22%) were found to have weak or absent CA125 expression on immunoperoxidase staining. Tissue expression of CA125 was compared to serum CA125 levels. Using immunoperoxidase staining of tissue arrays, we have assessed expression of 10 potential serum tumor markers in the 65 epithelial ovarian cancers with little or no CA125 expression and in ovarian cystadenomas, tumors of low malignant potential, normal ovaries, and 16 other normal tissues. Results: Low or absent expression of CA125 in surgical specimens of epithelial ovarian cancer was associated with low levels of serum CA125 in pre-operative serum specimens. In ovarian cancers that lacked CA125, all specimens (100%) expressed human kallikrein 10 (HK10), human kallikrein 6 (HK6), osteopontin (OPN), and claudin 3. A smaller fraction of CA125-deficient ovarian cancers expressed DF3 (95%), vascular endothelial growth factor (VEGF) (81%), MUC1 (62%), mesothelin (MES) (34%), HE4 (32%), and CA19-9 (29%). When reactivity with normal tissues was considered, however, MES and HE4 showed the greatest specificity. Differential expression was also found for HK10, OPN, DF3, and MUC1. Conclusions: At the level of tissue expression, each of 10 potential serum markers could be detected in 29-100% of ovarian cancers that had low or absent expression of CA125. Several markers exhibited more intense expression in cancers than in normal organs. Further investigation is needed to demonstrate complementary expression of markers in serum.
Article
This work describes an approach for data analysis based on symbolic regression and genetic programming, that produces an overall view of the dependencies of all variables of a system. The identi?ed dependencies are represented in form of a variable interaction network.
Book
The book covers the most common and important approaches for the identification of nonlinear static and dynamic systems. Additionally, it provides the reader with the necessary background on optimization techniques making the book self-contained. The emphasis is put on modern methods based on neural networks and fuzzy systems without neglecting the classical approaches. The entire book is written from an engineering point-of-view, focusing on the intuitive understanding of the basic relationships. This is supported by many illustrative figures. Advanced mathematics is avoided. Thus, the book is suitable for last year undergraduate and graduate courses as well as research and development engineers in industries. The new edition~includes exercises.
Article
Bell System Technical Journal, also pp. 623-656 (October)
Conference Paper
In this paper we describe the use of evolutionary algorithms for the selection of relevant features in the context of tumor marker modeling. Our aim is to identify mathematical models for classifying tumor marker values AFP and CA 15-3 using available patient parameters; data provided by the General Hospital Linz are used. The use of evolutionary algorithms for finding optimal sets of variables is discussed; we also define fitness functions that can be used for evaluating feature sets taking into account the number of selected features as well as the resulting classification accuracies. In the empirical section of this paper we document results achieved using an evolution strategy in combination with several machine learning algorithms (linear regression, k-nearest-neighbor modeling, and artificial neural networks) which are applied using cross-validation for evaluating sets of selected features. The identified sets of relevant variables as well as achieved classification rates are compared.
Conference Paper
In this article, we describe the use of tumour marker estimation models in the prediction of tumour diagnoses. In previous works, we have identified classification models for tumour markers that can be used for estimating tumour marker values on the basis of standard blood parameters. These virtual tumour markers are now used in combination with standard blood parameters for learning classifiers that are used for predicting tumour diagnoses. Several data-based modelling approaches implemented in HeuristicLab have been applied for identifying estimators for selected tumour markers and cancer diagnoses: linear regression, k-nearest neighbour (k-NN) learning, artificial neural networks (ANNs) and support vector machines (SVMs) (all optimised using evolutionary algorithms), as well as genetic programming (GP). We have applied these modelling approaches for identifying models for breast cancer diagnoses; in the results section, we summarise classification accuracies for breast cancer and we compare classification results achieved by models that use measured marker values as well as models that use virtual tumour markers.
Article
LIBSVM is a library for support vector machines (SVM). Its goal is to help users to easily use SVM as a tool. In this document, we present all its imple-mentation details. For the use of LIBSVM, the README file included in the package and the LIBSVM FAQ provide the information.
Article
Serum assays based on the CA125 antigen are widely used in the monitoring of patients with ovarian cancer; however very little is known about the molecular nature of the CA125 antigen. We recently cloned a partial cDNA (designated MUC16) that codes for a new mucin that is a strong candidate for being the CA125 antigen. This assignment has now been confirmed by transfecting a partial MUC16 cDNA into 2 CA125-negative cell lines and demonstrating the synthesis of CA125 by 3 different assays. Of the 3 antibodies (OC125, M11 and VK-8) tested on the transfected cells, only the first 2 were strongly positive, indicating the differential expression of the CA125 epitopes in these cells. The cloning and expression of CA125 antigen opens the way to an understanding of its function in normal and malignant cells. © 2002 Wiley-Liss, Inc.
Chapter
The sections in this article are1The Problem2Background and Literature3Outline4Displaying the Basic Ideas: Arx Models and the Linear Least Squares Method5Model Structures I: Linear Models6Model Structures Ii: Nonlinear Black-Box Models7General Parameter Estimation Techniques8Special Estimation Techniques for Linear Black-Box Models9Data Quality10Model Validation and Model Selection11Back to Data: The Practical Side of Identification
Conference Paper
Tumor markers are substances that are found in blood, urine, or body tissues and that are used as indicators for tumors; elevated tumor marker values can indicate the presence of cancer, but there can also be other causes. We have used a medical database compiled at the blood laboratory of the General Hospital Linz, Austria: Several blood values of thousands of patients are available as well as several tumor markers. We have used several data based modeling approaches for identifying mathematical models for estimating selected tumor marker values on the basis of routinely available blood values; in detail, estimators for the tumor markers AFP, CA-125, CA15-3, CEA, CYFRA, and PSA have been identified and are analyzed in this paper. The documented tumor marker values are classified as "normal" or "elevated"; our goal is to design classifiers for the respective binary classification problems. As we show in the results section, for those medical modeling tasks described here, genetic programming performs best among those techniques that are able to identify nonlinearities; we also see that GP results show less overfitting than those produced using other methods.
Article
We propose in this paper a very fast feature selection technique based on conditional mutual information. By picking features which maximize their mutual information with the class to predict conditional to any feature already picked, it ensures the selection of features which are both individually informative and two-by-two weakly dependant. We show that this feature selection method outperforms other classical algorithms, and that a naive Bayesian classifier built with features selected that way achieves error rates similar to those of state-of-the-art methods such as boosting or SVMs. The implementation we propose selects 50 features among 40,000, based on a training set of 500 examples in a tenth of a second on a standard 1Ghz PC.
Article
Feature Filters are among the simplest and fastest approaches to feature selection. A fil- ter defines a statistical criterion, used to rank features on how useful they are expected to be for classification. The highest ranking fea- tures are retained, and the lowest ranking can be discarded. A common approach is to use the Mutual Information between the feature and class label. This area has seen a recent flurry of activity, resulting in a confusing va- riety of heuristic criteria all based on mutual information, and a lack of a principled way to understand or relate them. The contribution of this paper is a unifying theoretical under- standing of such filters. In contrast to current methods which manually construct filter cri- teria with particular properties, we show how to naturally derive a space of possible rank- ing criteria. We will show that several recent contributions in the feature selection litera- ture are points within this continuous space, and that there exist many points that have never been explored.
Article
To evaluate the relationship between serum CA125 tumour marker level before and after surgery of epithelial ovarian carcinoma and assess its potential role as a prognostic factor. A retrospective review of 87 patients with epithelial ovarian carcinoma at a single centre between January 2001 and December 2005 was performed. Serum CA125 levels were assessed for their relationship to pathological stage, tumour grade, tumour volume and age as well as overall survival. A total of 75 patients, mean age 58.94 years and median follow-up of 24 months were included in the analysis. While the preoperative CA125 level did not correlate significantly with stage, tumour grade or survival, the postoperative CA125 correlated to FIGO stage (p<0.0001), tumour grade (p<0.0001) and overall survival (p=0.01). Reduced survival was noted with increasing age at the time of surgery (p=0.009) and bulk of the residual disease postoperatively (p=0.011).
Article
To investigate the clinical application value of serum tumor markers detection combined with support vector machine (SVM) model in the diagnosis of oral squamous cell carcinoma. Serum levels of neuron-specific enolase (NSE), cancer antigen 242 (CA242), cancer antigen 19-9 (CA199), carcinoembryonic antigen (CEA), tissue polypeptide antigen (TPA), cancer antigen 72-4 (CA724), cancer antigen 21-1 (CA211) and alpha fetoprotein (AFP) were detected with enzyme-linked immunosorbent assay (ELISA) and time-resolved fluoroimmunoassay (TRFIA) in 163 oral squamous cell carcinoma patients and 160 healthy persons. All the data was analyzed with SVM; the SVM models for diagnosis of oral squamous cell carcinoma were created, trained and validated by cross validation. Among the 163 oral squamous cell carcinoma patients, there were 128 males and 35 females with the male-to-female ratio of 3.66:1; the age ranged from 30 to 85 years old with a mean age of 59.3 years old; according to the primary site of tumor, 72 cases in tongue, 34 in gingiva, 22 in buccal mucosa, 15 in palatal mucosa, 13 in floor of mouth, 4 in lip and 3 in retromolar region; according to the TNM-UICC classification, there were 33 patients at stage T1, 72 at T2, 44 at T3, 14 at T4, 119 at N0, 42 at N1, 2 at N2, 159 at M0, 4 at M1, 27 at clinical stage I, 51 at stage II, 52 at III, and 33 at IV; according to the pathological differentiation grade, 109 tumors were well differentiated, 42 were moderately differentiated and 12 were poorly differentiated. Five serum tumor markers of CA211, CA199, TPA, CA724 and NSE were selected optimally to create the optimal SVM model for diagnosis of oral squamous cell carcinoma. The accuracy, specificity, sensitivity and positive predictive value of the optimal SVM model were 88.54%, 93.13%, 84.05% and 92.57%, respectively. From the results, SVM model combined with 5 optimal serum tumor markers is suggested to be used in the diagnosis of oral squamous cell carcinoma. Supported by Shanghai Leading Academic Discipline Project (Grant No.Y0203).
Article
A review of the status of standardization of laboratory tests of particular interest to oncologists is presented. Currently, relatively few of these tests are standardized; as a result, interlaboratory and interinstitutional comparison of data is problematic. In 1992, additional interlaboratory studies of common tumor markers will be initiated by the College of American Pathologists. The National Committee for Clinical Laboratory Standards also has begun to develop standard methods and guidelines for these important tests.
Article
The human CEA family has been fully characterized. It comprises 29 genes of which 18 are expressed; 7 belonging to the CEA subgroup and 11 to the pregnancy specific glycoprotein subgroup. CEA is an important tumor marker for colorectal and some other carcinomas. The CEA subgroup members are cell membrane associated and show a complex expression pattern in normal and cancerous tissues with notably CEA showing a selective epithelial expression. Several CEA subgroup members possess cell adhesion properties and the primordial member, biliary glycoprotein, seems to function in signal transduction or regulation of signal transduction possibly in association with other CEA sub-family members. A modified ITAM/ITIM motif is identified in the cytoplasmatic domain of BGP. A role of CEA in innate immunity is envisioned.
Article
Current tumor markers for ovarian cancer still lack adequate sensitivity and specificity to be applicable in large populations. High-throughput proteomic profiling and bioinformatics tools allow for the rapid screening of a large number of potential biomarkers in serum, plasma, or other body fluids. To determine whether protein profiles of plasma can be used to identify potential biomarkers that improve the detection of ovarian cancer. We analyzed plasma samples that had been collected between 1998 and 2001 from patients with sporadic ovarian serous neoplasms before tumor resection at various International Federation of Gynecology and Obstetrics stages (stage I [n = 11], stage II [n = 3], and stage III [n = 29]) and from women without known neoplastic disease (n = 38) using proteomic profiling and bioinformatics. We compared results between the patients with and without cancer and evaluated their discriminatory performance against that of the cancer antigen 125 (CA125) tumor marker. We selected 7 biomarkers based on their collective contribution to the separation of the 2 patient groups. Among them, we further purified and subsequently identified 3 biomarkers. Individually, the biomarkers did not perform better than CA125. However, a combination of 4 of the biomarkers significantly improved performance (P < or =.001). The new biomarkers were complementary to CA125. At a fixed specificity of 94%, an index combining 2 of the biomarkers and CA125 achieves a sensitivity of 94% (95% confidence interval, 85%-100.0%) in contrast to a sensitivity of 81% (95% confidence interval, 68%-95%) for CA125 alone. The combined use of bioinformatics tools and proteomic profiling provides an effective approach to screen for potential tumor markers. Comparison of plasma profiles from patients with and without known ovarian cancer uncovered a panel of potential biomarkers for detection of ovarian cancer with discriminatory power complementary to that of CA125. Additional studies are required to further validate these biomarkers.
Article
To evaluate the usefulness of tumor-marker measurements and to identify prognostic factors in patients with cancer of unknown primary (CUP), receiving platinum-based combination chemotherapy and to verify the adjustment of previously reported prognostic models in this population. We conducted univariate and multivariate analyses in consecutive patients with CUP receiving platinum-based combination chemotherapy. Previously reported prognostic models were then validated in this population. A total of 93 patients were analyzed and the response rate to platinum-based chemotherapeutic regimens among the 93 patients was 39.8%. The median time to progression and overall survival period were 4.1 and 12.4 months, respectively. The ST-439 level was significantly higher in patients with histologically confirmed adenocarcinoma than in patients with poorly differentiated adenocarcinoma or poorly differentiated carcinoma. A multivariate analysis indicated that performance status, the number of involved organs, and the serum lactate dehydrogenase level were the prognostic factors of the outcome. Both the previously reported prognostic models for predicting the duration of survival in this population were shown to be valid. Tumor-marker measurements are not helpful in the management of patients with CUP. Previously reported prognostic models may be useful for selecting indication for chemotherapy or for stratifying the patients in clinical trial.