Conference Paper

Classification of tumor marker values using heuristic data mining methods

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Tumor markers are substances that are found in blood, urine, or body tissues and that are used as indicators for tumors; elevated tumor marker values can indicate the presence of cancer, but there can also be other causes. We have used a medical database compiled at the blood laboratory of the General Hospital Linz, Austria: Several blood values of thousands of patients are available as well as several tumor markers. We have used several data based modeling approaches for identifying mathematical models for estimating selected tumor marker values on the basis of routinely available blood values; in detail, estimators for the tumor markers AFP, CA-125, CA15-3, CEA, CYFRA, and PSA have been identified and are analyzed in this paper. The documented tumor marker values are classified as "normal" or "elevated"; our goal is to design classifiers for the respective binary classification problems. As we show in the results section, for those medical modeling tasks described here, genetic programming performs best among those techniques that are able to identify nonlinearities; we also see that GP results show less overfitting than those produced using other methods.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... We have used a medical database compiled at the central laboratory of AKH: 28 routinely measured blood values of patients are available as well as several tumor markers (substances found in humans that can be used as indicators for certain types of cancer). This paper describes research that is a continuation of the results presented at further GECCO Workshops on Medical Applications of Genetic and Evolutionary Computation: In [25] we reported on the data based identification of mathematical models for tumor markers (i.e., virtual tumor markers), and in [26] we discussed the use of several evolutionary machine learning techniques for identifying predictors for cancer diagnoses. ...
... Please note that of course not all values are available in all samples; there are many missing values simply because not all blood values are measured during each examination. Further details about the data set and the applied preprocessing methods can be found in [25] and [26]. ...
... Information about the standard parameters (general patient information and standard blood values) stored in the AKH database (which are listed in the upper part of Table 1) can be found in [13], [23], and [25], e.g. ...
Conference Paper
In this paper we discuss the effects of using pre-clustered data on the identification of estimation models for cancer diagnoses. Based on patients' data records including standard blood parameters, tumor markers, and information about the diagnosis of tumors, the goal is to identify mathematical models for estimating cancer diagnoses. We have applied a hybrid clustering and classification approach that first identifies data clusters (using standard patient data and tumor markers) and then learns prediction models on the basis of these data clusters. In the empirical section we analyze the clusters of patient data samples formed using k-means clustering: The optimal number of clusters is identified, and we investigate the homogeneity of these clusters. Several evolutionary modeling approaches implemented in HeuristicLab have been applied for subsequently identifying estimators for selected cancer diagnoses: Linear regression, k-nearest neighbor learning, artificial neural networks, and support vector machines (all optimized using evolutionary algorithms) as well as genetic programming. As we show in the results section, the investigated diagnoses of breast cancer, melanoma, and respiratory system cancer can be estimated correctly in up to 84.2%, 80.3%, and 94.1% of the analyzed test cases, respectively; without tumor markers up to 78.2%, 78%, and 93.3% of the test samples are correctly estimated, respectively.
... Please note that of course not all values are available in all samples; there are many missing values simply because not all blood values are measured during each examination. Further details about the data set and necessary data preprocessing steps can for example be found in Winkler et al. (2010) and Winkler et al. (2011), e.g. ...
... As described in Winkler et al. (2010) and Winkler et al. (2011), information about the following tumor markers is stored in the AKH database: AFP, CA 125, CA 15-3, CA 19-9, CEA, CYFRA, fPSA, NSE, PSA, S-100, SCC, and TPS. ...
Article
In this paper we describe the identification of variable interaction networks in a medical data set. The main goal is to generate mathematical models for standard blood parameters as well as tumor markers using other available parameters in this data set. For each variable we identify those variables that are most relevant for modeling it; relevance of a variable can in this context be defined via the frequency of its occurrence in models identified by evolutionary machine learning methods or via the decrease in modeling quality after removing it from the data set. Several data based modeling approaches implemented in HeuristicLab have been applied for identifying estimators for selected tumor markers and cancer diagnoses: Linear regression and support vector machines (optimized using evolutionary algorithms) as well as genetic programming.
... Apart from applications of GP in the analysis of data of technical systems, we have also done intensive research on the analysis of medical data using heuristic data mining methods. We have described our research results (especially on the data based design of prediction models for the presence of tumors and other diseases) in numerous publications, please see for example [29], [34], [25], [3], or [26] for details. ...
Article
In this paper we summarize the use of genetic programming (GP) in nonlinear system identification: After giving a short introduction to evolutionary computation and genetic algorithms, we describe the basic principles of genetic programming and how it is used for data based identification of nonlinear mathematical models. Furthermore, we summarize projects in which we have successfully applied GP in R&D projects in the last years; we also give a summary of several algorithmic enhancements that have been successfully researched in the last years (including offspring selection, on-line and sliding window GP, operators for monitoring genetic process dynamics, and the design of cooperative evolutionary data mining agents). A short description of HeuristicLab (HL), the optimization framework developed by the HEAL research group, and the use of the GP implementations in HL are given in the appendix of this paper.
... 2) fault diagnosis such as network troubleshooting [22]; 3) medical diagnosis of rare conditions [21], [23], [24]; 4) bioinformatics tasks such as protein classification [25]; 5) financial modeling such as insurance approval [10] or bankruptcy prediction [26]; 6) object detection such as target [8], face [9], or pedestrian [27] detection (in large images). Much related work in developing methods to address the class imbalance problem involves two main approaches. ...
Article
Full-text available
Machine learning algorithms such as genetic programming (GP) can evolve biased classifiers when data sets are unbalanced. Data sets are unbalanced when at least one class is represented by only a small number of training examples (called the minority class) while other classes make up the majority. In this scenario, classifiers can have good accuracy on the majority class but very poor accuracy on the minority class(es) due to the influence that the larger majority class has on traditional training criteria in the fitness function. This paper aims to both highlight the limitations of the current GP approaches in this area and develop several new fitness functions for binary classification with unbalanced data. Using a range of real-world classification problems with class imbalance, we empirically show that these new fitness functions evolve classifiers with good performance on both the minority and majority classes. Our approaches use the original unbalanced training data in the GP learning process, without the need to artificially balance the training examples from the two classes (e.g., via sampling).
Chapter
In this chapter we present results of empirical research work done on the data based identification of estimation models for tumor markers and cancer diagnoses: Based on patients’ data records including standard blood parameters, tumor markers, and information about the diagnosis of tumors we have trained mathematical models that represent virtual tumor markers and predictors for cancer diagnoses, respectively. We have used a medical database compiled at the Central Laboratory of the General Hospital Linz, Austria, and applied several data based modeling approaches for identifying mathematical models for estimating selected tumor marker values on the basis of routinely available blood values; in detail, estimators for the tumor markers AFP, CA-125, CA15-3, CEA, CYFRA, and PSA have been identified and are discussed here. Furthermore, several data based modeling approaches implemented in HeuristicLab have been applied for identifying estimators for selected cancer diagnoses: Linear regression, k-nearest neighbor learning, artificial neural networks, and support vector machines (all optimized using evolutionary algorithms) as well as genetic programming. The investigated diagnoses of breast cancer, melanoma, and respiratory system cancer can be estimated correctly in up to 81%, 74%, and 91% of the analyzed test cases, respectively; without tumor markers up to 75%, 74%, and 87% of the test samples are correctly estimated, respectively.
Conference Paper
Standard patient parameters, tumor markers, and tumor diagnosis records are used for identifying prediction models for tumor markers as well as cancer diagnosis predictions. In this paper we present a hybrid clustering and classification approach that first identifies data clusters (using standard patient data and tumor markers) and then learns prediction models on the basis of these data clusters. The so formed clusters are analyzed and their homogeneity is calculated; the models learned on the basis of these clusters are tested and compared to each other with respect to classification accuracy and variable impacts.
Article
In this paper we describe the integration of ensemble modeling into genetic programming based classification and discuss concepts how to use genetic programming specific features for achieving new confidence indicators that estimate the trustworthiness of predictions. These new concepts are tested on a real world dataset from the field of medical diagnosis for cancer prediction where the trustworthiness of modeling results is of highest importance.
Article
In this paper we discuss heterogeneous estimation model ensembles for cancer diagnoses produced using various machine learning algorithms. Based on patients' data records including standard blood parameters, tumor markers, and information about the diagnosis of tumors, the goal is to identify mathematical models for estimating cancer diagnoses. Several machine learning approaches implemented in HeuristicLab and WEKA have been applied for identifying estimators for selected cancer diagnoses: k-nearest neighbor learning, decision trees, artificial neural networks, support vector machines, random forests, and genetic programming. The models produced using these methods have been combined to heterogeneous model ensembles. All models trained during the learning phase are applied during the test phase; the final classification is annotated with a confidence value that specifies how reliable the models are regarding the presented decision: We calculate the final estimation for each sample via majority voting, and the relative ratio of a sample's majority vote is used for calculating the confidence in the final estimation. We use a confidence threshold that specifies the minimum confidence level that has to be reached; if this threshold is not reached for a sample, then there is no prediction for that specific sample. As we show in the results section, the accuracies of diagnoses of breast cancer, melanoma, and respiratory system cancer can so be increased significantly. We see that increasing the confidence threshold leads to higher classification accuracies, bearing in mind that the ratio of samples, for which there is a classification statement, is significantly decreased.
Conference Paper
In this paper we describe the use of evolutionary algorithms for the selection of relevant features in the context of tumor marker modeling. Our aim is to identify mathematical models for classifying tumor marker values AFP and CA 15-3 using available patient parameters; data provided by the General Hospital Linz are used. The use of evolutionary algorithms for finding optimal sets of variables is discussed; we also define fitness functions that can be used for evaluating feature sets taking into account the number of selected features as well as the resulting classification accuracies. In the empirical section of this paper we document results achieved using an evolution strategy in combination with several machine learning algorithms (linear regression, k-nearest-neighbor modeling, and artificial neural networks) which are applied using cross-validation for evaluating sets of selected features. The identified sets of relevant variables as well as achieved classification rates are compared.
Conference Paper
In this article, we describe the use of tumour marker estimation models in the prediction of tumour diagnoses. In previous works, we have identified classification models for tumour markers that can be used for estimating tumour marker values on the basis of standard blood parameters. These virtual tumour markers are now used in combination with standard blood parameters for learning classifiers that are used for predicting tumour diagnoses. Several data-based modelling approaches implemented in HeuristicLab have been applied for identifying estimators for selected tumour markers and cancer diagnoses: linear regression, k-nearest neighbour (k-NN) learning, artificial neural networks (ANNs) and support vector machines (SVMs) (all optimised using evolutionary algorithms), as well as genetic programming (GP). We have applied these modelling approaches for identifying models for breast cancer diagnoses; in the results section, we summarise classification accuracies for breast cancer and we compare classification results achieved by models that use measured marker values as well as models that use virtual tumour markers.
Conference Paper
In this paper we report on the use of evolutionary algorithms for optimizing the identification of classification models for selected tumor markers. Our goal is to identify mathematical models that can be used for classifying tumor marker values as normal or as elevated; evolutionary algorithms are used for optimizing the parameters for learning classification models. The sets of variables used as well as the parameter settings for concrete modeling methods are optimized using evolution strategies and genetic algorithms. The performance of these algorithms is analyzed as well as the population diversity progress. In the empirical part of this paper we document modeling results achieved for tumor markers CA 125 and CYFRA using a medical data base provided by the Central Laboratory of the General Hospital Linz; empirical tests are executed using HeuristicLab.
Conference Paper
Full-text available
In this paper we present results of empirical research work done on the data based identification of estimation models for cancer diagnoses: Based on patients' data records including standard blood parameters, tumor markers, and information about the diagnosis of tumors we have trained mathematical models for estimating cancer diagnoses. Several data based modeling approaches implemented in HeuristicLab have been applied for identifying estimators for selected cancer diagnoses: Linear regression, k-nearest neighbor learning, artificial neural networks, and support vector machines (all optimized using evolutionary algorithms) as well as genetic programming. The investigated diagnoses of breast cancer, melanoma, and respiratory system cancer can be estimated correctly in up to 81%, 74%, and 91% of the analyzed test cases, respectively; without tumor markers up to 75%, 74%, and 87% of the test samples are correctly estimated, respectively.
Article
Full-text available
This paper presents a new generic Evolutionary Algorithm (EA) for retarding the unwanted effects of premature convergence. This is accomplished by a combination of interacting generic methods. These generalizations of a Genetic Algorithm (GA) are inspired by population genetics and take advantage of the interactions between genetic drift and migration. In this regard a new selection scheme is introduced, which is designed to directedly control genetic drift within the population by advantageous self-adaptive selection pressure steering. Additionally this new selection model enables a quite intuitive heuristics to detect premature convergence. Based upon this newly postulated basic principle the new selection mechanism is combined with the already proposed Segregative Genetic Algorithm (SEGA), an advanced Genetic Algorithm (GA) that introduces parallelism mainly to improve global solution quality. As a whole, a new generic evolutionary algorithm (SASEGASA) is introduced. The performance of the algorithm is evaluated on a set of characteristic benchmark problems. Computational results show that the new method is capable of producing highest quality solutions without any problem-specific additions.
Conference Paper
Full-text available
Selection for reproduction in the context of Genetic Algorithms uses only one selection scheme to select parent individuals. When considering the model of sexual selection in the area of population genetics it gets obvious that the process of choosing mating partners in natural populations is difierent for male and female individuals. In this paper the authors introduce a new selection paradigm for Genetic Algorithms (SexualGA) based upon the concepts of male vigor and female choice of population genetics which provides the possibility to use two difierent selection schemes simultaneously within one algorithm. By using this new concept it is possible to simulate sexual selection in natural populations more precisely. Furthermore, SexualGA also ofiers far more ∞exibility concerning the adaptivity of selection pressure enabling the GA user to tune the algorithm more accurately.
Chapter
Full-text available
In terms of goal orientedness, selection is the driving force of Genetic Algorithms (GAs). In contrast to crossover and mutation, selection is completely generic, i.e. independent of the actually employed problem and its representation. GA-selection is usually implemented as selection for reproduction (parent selection). In this paper we propose a second selection step after reproduction which is also absolutely problem independent. This self-adaptive selection mechanism, which will be referred to as offspring selection, is closely related to the general selection model of population genetics. As the problem- and representation-specific implementation of reproduction in GAs (crossover) is often critical in terms of preservation of essential genetic information, offspring selection has proven to be very suited for improving the global solution quality and robustness concerning parameter settings and operators of GAs in various fields of applications. The experimental part of the paper discusses the potential of the new selection model exemplarily on the basis of standardized real-valued test functions in high dimensions.
Conference Paper
Full-text available
This contribution proposes an enhanced and generic selection model for Genetic Algorithms (GAs) and Genetic Programming (GP) which is able to preserve the alleles which are part of a high quality solution. Some selected aspects of these enhanced techniques are discussed exemplarily on the basis of standardized benchmark problems.
Book
Full-text available
Genetic Algorithms and Genetic Programming: Modern Concepts and Practical Applications discusses algorithmic developments in the context of genetic algorithms (GAs) and genetic programming (GP). It applies the algorithms to significant combinatorial optimization problems and describes structure identification using HeuristicLab as a platform for algorithm development. The book focuses on both theoretical and empirical aspects. The theoretical sections explore the important and characteristic properties of the basic GA as well as main characteristics of the selected algorithmic extensions developed by the authors. In the empirical parts of the text, the authors apply GAs to two combinatorial optimization problems: the traveling salesman and capacitated vehicle routing problems. To highlight the properties of the algorithmic measures in the field of GP, they analyze GP-based nonlinear structure identification applied to time series and classification problems. Written by core members of the HeuristicLab team, this book provides a better understanding of the basic workflow of GAs and GP, encouraging readers to establish new bionic, problem-independent theoretical concepts. By comparing the results of standard GA and GP implementation with several algorithmic extensions, it also shows how to substantially increase achievable solution quality.
Article
Full-text available
The effect of screening with prostate-specific-antigen (PSA) testing and digital rectal examination on the rate of death from prostate cancer is unknown. This is the first report from the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial on prostate-cancer mortality. From 1993 through 2001, we randomly assigned 76,693 men at 10 U.S. study centers to receive either annual screening (38,343 subjects) or usual care as the control (38,350 subjects). Men in the screening group were offered annual PSA testing for 6 years and digital rectal examination for 4 years. The subjects and health care providers received the results and decided on the type of follow-up evaluation. Usual care sometimes included screening, as some organizations have recommended. The numbers of all cancers and deaths and causes of death were ascertained. In the screening group, rates of compliance were 85% for PSA testing and 86% for digital rectal examination. Rates of screening in the control group increased from 40% in the first year to 52% in the sixth year for PSA testing and ranged from 41 to 46% for digital rectal examination. After 7 years of follow-up, the incidence of prostate cancer per 10,000 person-years was 116 (2820 cancers) in the screening group and 95 (2322 cancers) in the control group (rate ratio, 1.22; 95% confidence interval [CI], 1.16 to 1.29). The incidence of death per 10,000 person-years was 2.0 (50 deaths) in the screening group and 1.7 (44 deaths) in the control group (rate ratio, 1.13; 95% CI, 0.75 to 1.70). The data at 10 years were 67% complete and consistent with these overall findings. After 7 to 10 years of follow-up, the rate of death from prostate cancer was very low and did not differ significantly between the two study groups. (ClinicalTrials.gov number, NCT00002540.)
Article
Full-text available
The present approach to cancer treatment is often referred to as "trial and error" or "one size fits all." This practice is inefficient and frequently results in inappropriate therapy and treatment-related toxicity. In contrast, personalized treatment has the potential to increase efficacy and decrease toxicity. We reviewed the literature relevant to prognostic, predictive, and toxicity-related markers in cancer, with particular attention to systematic reviews, prospective randomized trials, and guidelines issued by expert panels. To achieve personalized treatment for cancer, we need markers for determining prognosis, predicting response to therapy, and predicting severe toxicity related to treatment. Among the best-validated prognostic markers currently available are serum concentrations of alpha-fetoprotein (AFP), human chorionic gonadotropin (hCG), and lactate dehydrogenase (LDH) for patients with nonseminoma germ cell tumors and tissue concentrations of both urokinase plasminogen activator and plasminogen activator inhibitor 1 (PAI-1) for breast cancer patients. Clinically useful therapy predictive markers are estrogen and progesterone receptors to select patients with breast cancer for treatment with endocrine therapy and human epidermal growth factor receptor 2 (HER-2) to select breast cancer patients for treatment with trastuzumab (Herceptin). Markers available for identifying drug-induced adverse reactions include thiopurine methyltransferase (TPMT) to predict toxicity from thiopurines in the treatment of acute lymphoblastic leukemia and uridine diphosphate glucuronyltransferase to predict toxicity from irinotecan in the treatment of colorectal cancer. Validated prognostic, predictive, and toxicity markers should help cancer treatment move from the current trial-and-error approach to more personalized treatment.
Article
Full-text available
The aim was to investigate the diagnostic utility of CYFRA 21-1 (cytokeratin 19 fragment) as a tumor marker in pleural effusion and evaluate the value of combining CYFRA 21-1 and carcinoembryonic antigen (CEA) assays as a diagnostic aid in the malignant pleural effusion. One hundred and twenty-six patients (72 malignant and 54 benign pleural effusion) were included in this retrospective study. The effusion levels of CYFRA 21-1 and CEA were measured using radioimmunometric assay. The median values of CYFRA 21-1 in benign and malignant pleural effusion are 15 and 70 ng/ml, respectively. Using a cut-off value of 50 ng/ml, defined at 94% specificity, the diagnostic sensitivity of CYFRA 21-1 for non-small cell lung carcinoma (n = 61), squamous cell carcinoma (n = 21), adenocarcinoma (n = 40) and small cell lung cancer (n = 11) was 64, 71, 60 and 18%, respectively. Regardless of cell types, the diagnostic sensitivity of CYFRA 21-1 and CEA in malignant pleural effusion (n = 72) was 57 and 60%, respectively (cut-off value of 10 ng/ml in CEA assay). Combining CEA with CYFRA 21-1, the diagnostic sensitivity may increase up to 72%, which was defined at 89% specificity. CYFRA 21-1 assay may be a useful tumor marker for discriminating benign from malignant pleural effusion, especially in those of non-small cell lung cancer. The combined use of CEA and CYFRA 21-1 assay in the malignant effusion may increase the diagnostic yield compared with CEA or CYFRA 21-1 alone.
Article
Full-text available
Two methods were used to demonstrate the presence of tumor-specific antigens in adenocarcinomata of the human colon: (a) rabbits were immunized with extracts of pooled colonic carcinomata, and the antitumor antisera thus produced were absorbed with a pooled extract of normal human colon and with human blood components; (b) newborn rabbits were made immunologically tolerant to normal colonic tissue at birth, and were then immunized with pooled tumor material in adult life. Normal and tumor tissues were obtained from the same human donors in order to avoid misinterpretation of results due to individual-specific antigenic differences. The antisera prepared by both methods were tested against normal and tumor antigens by the techniques of agar gel diffusion, immunoelectrophoresis, hemagglutination, PCA, and immunofluorescence. Distinct antibody activity directed against at least two qualitatively tumor-specific antigens, or antigenic determinants, was detected in the antisera prepared by both methods and at least two additional tumor antigens were detected exclusively in antisera prepared by the tolerance technique. Whether these additional antigens were qualitatively different from normal tissue antigens, or merely present in tumor tissue in higher concentrations than in normal tissue has not as yet been determined. Furthermore, it was shown that the tumor-specific antibodies were not directed against bacterial contaminants or against the unusually high concentrations of fibrin found in many neoplastic tissues. It was concluded from these results that the pooled tumor extracts contained tumor-specific antigens not present in normal colonic tissue. Identical tumor-specific antigens were also demonstrated in a number of individual colonic carcinomata obtained from different human donors.
Article
Full-text available
The optimal upper limit of the normal range for prostate-specific antigen (PSA) is unknown. We investigated the prevalence of prostate cancer among men in the Prostate Cancer Prevention Trial who had a PSA level of 4.0 ng per milliliter or less. Of 18,882 men enrolled in the prevention trial, 9459 were randomly assigned to receive placebo and had an annual measurement of PSA and a digital rectal examination. Among these 9459 men, 2950 men never had a PSA level of more than 4.0 ng per milliliter or an abnormal digital rectal examination, had a final PSA determination, and underwent a prostate biopsy after being in the study for seven years. Among the 2950 men (age range, 62 to 91 years), prostate cancer was diagnosed in 449 (15.2 percent); 67 of these 449 cancers (14.9 percent) had a Gleason score of 7 or higher. The prevalence of prostate cancer was 6.6 percent among men with a PSA level of up to 0.5 ng per milliliter, 10.1 percent among those with values of 0.6 to 1.0 ng per milliliter, 17.0 percent among those with values of 1.1 to 2.0 ng per milliliter, 23.9 percent among those with values of 2.1 to 3.0 ng per milliliter, and 26.9 percent among those with values of 3.1 to 4.0 ng per milliliter. The prevalence of high-grade cancers increased from 12.5 percent of cancers associated with a PSA level of 0.5 ng per milliliter or less to 25.0 percent of cancers associated with a PSA level of 3.1 to 4.0 ng per milliliter. Biopsy-detected prostate cancer, including high-grade cancers, is not rare among men with PSA levels of 4.0 ng per milliliter or less--levels generally thought to be in the normal range.
Article
Full-text available
Several lines of evidence point towards a biological role of mucin and particularly MUC1 in colorectal cancer. A positive correlation was described between mucin secretion, proliferation, invasiveness, metastasis and bad prognosis. But, the role of MUC1 in cancer progression is still controversial and somewhat confusing. While Mukherjee and colleagues developed MUC1-specific immune therapy in a CRC model, Lillehoj and co-investigators showed recently that MUC1 inhibits cell proliferation by a beta-catenin-dependent mechanism. In carcinoma cells the polarization of MUC1 is lost and the protein is over expressed at high levels over the entire cell surface. A competitive interaction between MUC1 and E-cadherin, through beta-catenin binding, disrupts E-cadherin-mediated cell-cell interactions at sites of MUC1 expression. In addition, the complex of MUC1-beta-catenin enters the nucleus and activates T-cell factor/leukocyte enhancing factor 1 transcription factors and activates gene expression. This mechanism may be similar to that just described for DCC and UNC5H, which induced apoptosis when not engaged with their ligand netrin, but mediate signals for proliferation, differentiation or migration when ligand bound.
Article
Context.—Current tumor markers for ovarian cancer still lack adequate sensitivity and specificity to be applicable in large populations. High-throughput proteomic profiling and bioinformatics tools allow for the rapid screening of a large number of potential biomarkers in serum, plasma, or other body fluids. Objective.—To determine whether protein profiles of plasma can be used to identify potential biomarkers that improve the detection of ovarian cancer. Design.—We analyzed plasma samples that had been collected between 1998 and 2001 from patients with sporadic ovarian serous neoplasms before tumor resection at various International Federation of Gynecology and Obstetrics stages (stage I [n = 11], stage II [n = 3], and stage III [n = 29]) and from women without known neoplastic disease (n = 38) using proteomic profiling and bioinformatics. We compared results between the patients with and without cancer and evaluated their discriminatory performance against that of the cancer antigen 125 (CA125) tumor marker. Results.—We selected 7 biomarkers based on their collective contribution to the separation of the 2 patient groups. Among them, we further purified and subsequently identified 3 biomarkers. Individually, the biomarkers did not perform better than CA125. However, a combination of 4 of the biomarkers significantly improved performance (P ≤ .001). The new biomarkers were complementary to CA125. At a fixed specificity of 94%, an index combining 2 of the biomarkers and CA125 achieves a sensitivity of 94% (95% confidence interval, 85%–100.0%) in contrast to a sensitivity of 81% (95% confidence interval, 68%–95%) for CA125 alone. Conclusions.—The combined use of bioinformatics tools and proteomic profiling provides an effective approach to screen for potential tumor markers. Comparison of plasma profiles from patients with and without known ovarian cancer uncovered a panel of potential biomarkers for detection of ovarian cancer with discriminatory power complementary to that of CA125. Additional studies are required to further validate these biomarkers.
Article
LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Article
Serum assays based on the CA125 antigen are widely used in the monitoring of patients with ovarian cancer; however very little is known about the molecular nature of the CA125 antigen. We recently cloned a partial cDNA (designated MUC16) that codes for a new mucin that is a strong candidate for being the CA125 antigen. This assignment has now been confirmed by transfecting a partial MUC16 cDNA into 2 CA125-negative cell lines and demonstrating the synthesis of CA125 by 3 different assays. Of the 3 antibodies (OC125, M11 and VK-8) tested on the transfected cells, only the first 2 were strongly positive, indicating the differential expression of the CA125 epitopes in these cells. The cloning and expression of CA125 antigen opens the way to an understanding of its function in normal and malignant cells. © 2002 Wiley-Liss, Inc.
Article
Operational protocols are a valuable means for quality control. However, developing operational protocols is a highly complex and costly task. We present an integrated approach involving both intelligent data analysis and knowledge acquisition from experts that supports the development of operational protocols. The aim is to ensure high quality standards for the protocol through empirical validation during the development, as well as lower development cost through the use of machine learning and statistical techniques. We demonstrate our approach of integrating expert knowledge with data driven techniques based on our effort to develop an operational protocol for the hemodynamic system. 1. (To appear in "Artificial Intelligence in Medicine", thematic issue on Knowledge-Based Information Management in Intensive Care and Anaesthesia) Morik et al: Knowledge Discovery and Knowledge Validation in Intensive Care 2 of 32 2 Key words operational protocols, online-monitoring, time series a...
Article
Humankind has given itself the scientific name homo sapiens--man the wise--because our mental capacities are so important to our everyday lives and our sense of self. The field of artificial intelligence, or AI, attempts to understand intelligent entities. Thus, one reason to study it is to learn more about ourselves. But unlike philosophy and psychology, which are also concerned with AI strives to build intelligent entities as well as understand them. Another reason to study AI is that these constructed intelligent entities are interesting and useful in their own right. AI has produced many significant and impressive products even at this early stage in its development. Although no one can predict the future in detail, it is clear that computers with human-level intelligence (or better) would have a huge impact on our everyday lives and on the future course of civilization. AI addresses one of the ultimate puzzles. How is it possible for a slow, tiny brain{brain}, whether biological or electronic, to perceive, understand, predict, and manipulate a world far larger and more complicated than itself? How do we go about making something with those properties? These are hard questions, but unlike the search for faster-than-light travel or an antigravity device, the researcher in AI has solid evidence that the quest is possible. All the researcher has to do is look in the mirror to see an example of an intelligent system. AI is one of the newest disciplines. It was formally initiated in 1956, when the name was coined, although at that point work had been under way for about five years. Along with modern genetics, it is regularly cited as the ``field I would most like to be in'' by scientists in other disciplines. A student in physics might reasonably feel that all the good ideas have already been taken by Galileo, Newton, Einstein, and the rest, and that it takes many years of study before one can contribute new ideas. AI, on the other hand, still has openings for a full-time Einstein. The study of intelligence is also one of the oldest disciplines. For over 2000 years, philosophers have tried to understand how seeing, learning, remembering, and reasoning could, or should, be done. The advent of usable computers in the early 1950s turned the learned but armchair speculation concerning these mental faculties into a real experimental and theoretical discipline. Many felt that the new ``Electronic Super-Brains'' had unlimited potential for intelligence. ``Faster Than Einstein'' was a typical headline. But as well as providing a vehicle for creating artificially intelligent entities, the computer provides a tool for testing theories of intelligence, and many theories failed to withstand the test--a case of ``out of the armchair, into the fire.'' AI has turned out to be more difficult than many at first imagined, and modern ideas are much richer, more subtle, and more interesting as a result. AI currently encompasses a huge variety of subfields, from general-purpose areas such as perception and logical reasoning, to specific tasks such as playing chess, proving mathematical theorems, writing poetry{poetry}, and diagnosing diseases. Often, scientists in other fields move gradually into artificial intelligence, where they find the tools and vocabulary to systematize and automate the intellectual tasks on which they have been working all their lives. Similarly, workers in AI can choose to apply their methods to any area of human intellectual endeavor. In this sense, it is truly a universal field.
Article
To evaluate the relationship between serum CA125 tumour marker level before and after surgery of epithelial ovarian carcinoma and assess its potential role as a prognostic factor. A retrospective review of 87 patients with epithelial ovarian carcinoma at a single centre between January 2001 and December 2005 was performed. Serum CA125 levels were assessed for their relationship to pathological stage, tumour grade, tumour volume and age as well as overall survival. A total of 75 patients, mean age 58.94 years and median follow-up of 24 months were included in the analysis. While the preoperative CA125 level did not correlate significantly with stage, tumour grade or survival, the postoperative CA125 correlated to FIGO stage (p<0.0001), tumour grade (p<0.0001) and overall survival (p=0.01). Reduced survival was noted with increasing age at the time of surgery (p=0.009) and bulk of the residual disease postoperatively (p=0.011).
Article
To investigate the clinical application value of serum tumor markers detection combined with support vector machine (SVM) model in the diagnosis of oral squamous cell carcinoma. Serum levels of neuron-specific enolase (NSE), cancer antigen 242 (CA242), cancer antigen 19-9 (CA199), carcinoembryonic antigen (CEA), tissue polypeptide antigen (TPA), cancer antigen 72-4 (CA724), cancer antigen 21-1 (CA211) and alpha fetoprotein (AFP) were detected with enzyme-linked immunosorbent assay (ELISA) and time-resolved fluoroimmunoassay (TRFIA) in 163 oral squamous cell carcinoma patients and 160 healthy persons. All the data was analyzed with SVM; the SVM models for diagnosis of oral squamous cell carcinoma were created, trained and validated by cross validation. Among the 163 oral squamous cell carcinoma patients, there were 128 males and 35 females with the male-to-female ratio of 3.66:1; the age ranged from 30 to 85 years old with a mean age of 59.3 years old; according to the primary site of tumor, 72 cases in tongue, 34 in gingiva, 22 in buccal mucosa, 15 in palatal mucosa, 13 in floor of mouth, 4 in lip and 3 in retromolar region; according to the TNM-UICC classification, there were 33 patients at stage T1, 72 at T2, 44 at T3, 14 at T4, 119 at N0, 42 at N1, 2 at N2, 159 at M0, 4 at M1, 27 at clinical stage I, 51 at stage II, 52 at III, and 33 at IV; according to the pathological differentiation grade, 109 tumors were well differentiated, 42 were moderately differentiated and 12 were poorly differentiated. Five serum tumor markers of CA211, CA199, TPA, CA724 and NSE were selected optimally to create the optimal SVM model for diagnosis of oral squamous cell carcinoma. The accuracy, specificity, sensitivity and positive predictive value of the optimal SVM model were 88.54%, 93.13%, 84.05% and 92.57%, respectively. From the results, SVM model combined with 5 optimal serum tumor markers is suggested to be used in the diagnosis of oral squamous cell carcinoma. Supported by Shanghai Leading Academic Discipline Project (Grant No.Y0203).
Article
A review of the status of standardization of laboratory tests of particular interest to oncologists is presented. Currently, relatively few of these tests are standardized; as a result, interlaboratory and interinstitutional comparison of data is problematic. In 1992, additional interlaboratory studies of common tumor markers will be initiated by the College of American Pathologists. The National Committee for Clinical Laboratory Standards also has begun to develop standard methods and guidelines for these important tests.
Article
The human CEA family has been fully characterized. It comprises 29 genes of which 18 are expressed; 7 belonging to the CEA subgroup and 11 to the pregnancy specific glycoprotein subgroup. CEA is an important tumor marker for colorectal and some other carcinomas. The CEA subgroup members are cell membrane associated and show a complex expression pattern in normal and cancerous tissues with notably CEA showing a selective epithelial expression. Several CEA subgroup members possess cell adhesion properties and the primordial member, biliary glycoprotein, seems to function in signal transduction or regulation of signal transduction possibly in association with other CEA sub-family members. A modified ITAM/ITIM motif is identified in the cytoplasmatic domain of BGP. A role of CEA in innate immunity is envisioned.
Article
Operational protocols are a valuable means for quality control. However, developing operational protocols is a highly complex and costly task. We present an integrated approach involving both intelligent data analysis and knowledge acquisition from experts that support the development of operational protocols. The aim is to ensure high quality standards for the protocol through empirical validation during the development, as well as lower development cost through the use of machine learning and statistical techniques. We demonstrate our approach of integrating expert knowledge with data driven techniques based on our effort to develop an operational protocol for the hemodynamic system.
Article
To evaluate the usefulness of tumor-marker measurements and to identify prognostic factors in patients with cancer of unknown primary (CUP), receiving platinum-based combination chemotherapy and to verify the adjustment of previously reported prognostic models in this population. We conducted univariate and multivariate analyses in consecutive patients with CUP receiving platinum-based combination chemotherapy. Previously reported prognostic models were then validated in this population. A total of 93 patients were analyzed and the response rate to platinum-based chemotherapeutic regimens among the 93 patients was 39.8%. The median time to progression and overall survival period were 4.1 and 12.4 months, respectively. The ST-439 level was significantly higher in patients with histologically confirmed adenocarcinoma than in patients with poorly differentiated adenocarcinoma or poorly differentiated carcinoma. A multivariate analysis indicated that performance status, the number of involved organs, and the serum lactate dehydrogenase level were the prognostic factors of the outcome. Both the previously reported prognostic models for predicting the duration of survival in this population were shown to be valid. Tumor-marker measurements are not helpful in the management of patients with CUP. Previously reported prognostic models may be useful for selecting indication for chemotherapy or for stratifying the patients in clinical trial.
Article
The analysis of tumour markers is based on the evaluation of data in relation to defined cut-off values. Changes in the method of determination or reference study group have led to different results. Cut-off-independent diagnostic evaluation of laboratory parameters can avoid laboratory-based and method-derived systematic errors. The decision guarantee (DG) is an appropriate parameter that can be determined using a defined reference population and its respective receiver operating characteristic (ROC) curve. The influence of ROC differences on the determination of DG is examined. A group of 281 consecutive patients with newly diagnosed, histologically confirmed lung cancer and a control group of 231 patients were examined. Histological classification of the tumour cases defined in 59 small-cell carcinoma, 102 squamous cell carcinomas, 66 adenocarcinomas and 54 large-cell carcinomas or mixed bronchial carcinomas without classification. The control group without tumours consisted of 23 healthy subjects, 125 patients with silicosis or asbestosis, 27 with chronic obstructive pulmonary diseases (COPD) and 56 suffering from inflammatory lung diseases. Cytokeratin-19 fragments (CYFRA 21-1) was the most sensitive marker with a sensitivity of 57.3% and a specificity of 94.9%. Sensitivity and specificity influence each other. Related to the ROC curve, the method described here ensured the diagnosis of lung cancer on the basis of the data collected in comparison with a reference population. Thus, it was possible to determine with statistical certainty whether the evaluation of the sample data would lead to a diagnosis of lung cancer. The DG provides the basis for a laboratory-and method-independent support for a diagnosis including fairer information about the reference population in the data analysis.
System Identification -- Theory For the User , 2 nd edition. PTR Prentice Hall , Upper Saddle River
  • L Ljung
  • Ljung L.
LaFleur-Brooks. Exploring Medical Language: A Student-Directed Approach
  • M Lafleur-Brooks
  • LaFleur-Brooks M.