Article

An Explicit Functional Form Specification Approach to Estimate the Area Under a Receiver Operating Characteristic (ROC) Curve

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The Receiver Operating Characteristic (ROC) curve is a curve presented in a probability scale graph and is used to judge the discrimination ability of various statistical methods for predictive purposes. The area under the ROC curve can be measured and converted to a single quantitative index for diagnostic accuracy. An explicit functional form approach is proposed as an alternative estimation method to evaluate the area under a ROC curve. This paper provides an explicit functional form to represent the ROC curve through SAS code for parameter estimation and the area under the curve calculation. The empirical ROC curves produced from this approach are much smoother with convexity of the curves.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Using a population of 200, at each generation, 200 new offspring are generated, then parents and offspring are merged into one population pool before running pareto-based selection to select the best 200). During evolution, we aim to minimize three fitness objectives, where AUC is a the area under ROC, calculated using the Mann-Whitney (Stober and Yeh 2007) test, where the false positive rate (FPR) and TPR are calculated with the output threshold set using the binarization technique mentioned above: ...
Chapter
Full-text available
This chapter describes a general approach for image classification using Genetic Programming (GP) and demonstrates this approach through the application of GP to the task of stage 1 cancer detection in digital mammograms. We detail an automated work-flow that begins with image processing and culminates in the evolution of classification models which identify suspicious segments of mammograms. Early detection of breast cancer is directly correlated with survival of the disease and mammography has been shown to be an effective tool for early detection, which is why many countries have introduced national screening programs. However, this presents challenges, as such programs involve screening a large number of women and thus require more trained radiologists at a time when there is a shortage of these professionals in many countries.Also, as mammograms are difficult to read and radiologists typically only have a few minutes allocated to each image, screening programs tend to be conservative—involving many callbacks which increase both the workload of the radiologists and the stress and worry of patients.Fortunately, the relatively recent increase in the availability of mammograms in digital form means that it is now much more feasible to develop automated systems for analysing mammograms. Such systems, if successful could provide a very valuable second reader function.We present a work-flow that begins by processing digital mammograms to segment them into smaller sub-images and to extract features which describe textural aspects of the breast. The most salient of these features are then used in a GP system which generates classifiers capable of identifying which particular segments may have suspicious areas requiring further investigation. An important objective of this work is to evolve classifiers which detect as many cancers as possible but which are not overly conservative. The classifiers give results of 100 % sensitivity and a false positive per image rating of just 0.33, which is better than prior work. Not only this, but our system can use GP as part of a feedback loop, to both select and help generate further features.
... Using a population of 200, at each generation, 200 new offspring are generated, then parents and offspring are merged into one pool before running Pareto-based selection to select the best 200. During evolution, we aim to minimize three fitness objectives: FP Rate, 1−TP Rate and 1−AUC, where AUC is a the area under ROC, calculated using the Mann-Whitney [31] test. ...
Conference Paper
Full-text available
We present an automated, end-to-end approach for Stage~1 breast cancer detection. The first phase of our proposed work-flow takes individual digital mammograms as input and outputs several smaller sub-images from which the background has been removed. Next, we extract a set of features which capture textural information from the segmented images. In the final phase, the most salient of these features are fed into a Multi-Objective Genetic Programming system which then evolves classifiers capable of identifying those segments which may have suspicious areas that require further investigation. A key aspect of this work is the examination of several new experimental configurations which focus on textural asymmetry between breasts. The best evolved classifier using such a configuration can deliver results of 100% accuracy on true positives and a false positive per image rating of just 0.33, which is better than the current state of the art.
... There are various ways to calculate the AUC and we have chosen the Wilcoxon Mann Whitney approximation as it is easy to calculate and has the advantage that it facilitates the estimation of confidence intervals [20]. ...
Conference Paper
Full-text available
There have been many studies undertaken to determine the efficacy of parameters and algorithmic components of Genetic Programming, but historically, generalization considerations have not been of central importance in such investigations. Recent contributions have stressed the importance of generalisation to the future development of the field. In this paper we investigate aspects of selection bias as a component of generalisation error, where selection bias refers to the method used by the learning system to select one hypothesis over another. Sources of potential bias include the replacement strategy chosen and the means of applying selection pressure. We investigate the effects on generalisation of two replacement strategies, together with tournament selection with a range of tournament sizes. Our results suggest that larger tournaments are more prone to overfitting than smaller ones, and that a small tournament combined with a generational replacement strategy produces relatively small solutions and is least likely to over-fit.
... A perfect ROC will have an AUC of 1, whereas the ROC plot of a random classifier will result in an AUC of approximately 0.5. There are various ways to calculate the AUC and we have chosen the Wilcoxon Mann Whitney approximation as it is easy to calculate and has the advantage that it facilitates the estimation of confidence intervals [29]. ...
Conference Paper
Full-text available
For some time, Genetic Programming research has lagged behind the wider Machine Learning community in the study of generalisation, where the decomposition of generalisation error into bias and variance components is well understood. However, recent Genetic Programming contributions focusing on complexity, size and bloat as they relate to over-fitting have opened up some interesting avenues of research. In this paper, we carry out a simple empirical study on five binary classification problems. The study is designed to discover what effects may be observed when program size and complexity are varied in combination, with the objective of gaining a better understanding of relationships which may exist between solution size, operator complexity and variance error. The results of the study indicate that the simplest configuration, in terms of operator complexity, consistently results in the best average performance, and in many cases, the result is significantly better. We further demonstrate that the best results are achieved when this minimum complexity set-up is combined with a less than parsimonious permissible size.
Article
Full-text available
This paper empirically examined the relationship between fuel price and exchange rate in South Africa. Monthly data spanning over the period of January 2001 to December 2013 was used while adopting the cointegration method. The Augumented Dickey Fuller (ADF) test showed that all variables (Fuel Price, Exchange rate and New Vehicle sales) became stationary after the first difference. The results from Johansen cointegration test indicated no cointegrating equation, indicating that series were not cointegrated. The findings show that fuel price is affected by at least its two previous month prices. Both explanatory variable coefficients (0.541228 and -0.368649), show that fuel price will be increased by 20 cents Rand due to its previous two month prices. The results from impulsive test confirmed VAR test results. This paper provided evidence that there was a causal link from the exchange rates to petrol price during last one sub-period. The implication therefore is that in South Africa an increase of the fuel price is a response to the Rand value fluctuations ceteris paribus. Based on the findings of the study, policy implications and suggestion for future research are made
Article
Full-text available
The study stems from the relevance of the global economic crisis which is affecting companies to an increasing extent. The objective of the paper is to test the degree of effectiveness of the insolvency prediction models, most widely used in the literature, including recent works (Jackson and Wood, 2013), with reference to Lombardy, the most important Italian region in terms of industrialization rate. The following models were used, selected according to their diffusion and the statistical technique used: 1) Discriminant analysis (Altman, 1983), (Taffler, 1983); 2) Logit Analysis (Ohlson, 1980). The study identifies the state of health of companies in 2012, using the financial reporting data of the three previous years. The research sample consists of 58,750 companies (58,367 non-failed and 383 failed). Among the main results, it is observed that, for all the models, a prediction of default is often erroneously made for companies which are solvent, whereas failed companies are classified with a lower degree of error. The objective of the paper is preparatory to the second part of the research in progress in which, on the basis of the results presented here, some modifications will be made to the insolvency prediction models selected, significant for the Italian context, with the aim of identifying a company insolvency “alert model” which can be used by the various stakeholders. The results are interpreted in the light of the Stakeholder Theory
Conference Paper
Full-text available
We describe a fully automated workflow for performing stage 1 breast cancer detection with GP as its cornerstone. Mammograms are by far the most widely used method for detecting breast cancer in women, and its use in national screening can have a dramatic impact on early detection and survival rates. With the increased availability of digital mammography, it is becoming increasingly more feasible to use automated methods to help with detection. A stage 1 detector examines mammograms and highlights suspicious areas that require further investigation. A too conservative approach degenerates to marking every mammogram (or segment of) as suspicious, while missing a cancerous area can be disastrous. Our workflow positions us right at the data collection phase such that we generate textural features ourselves. These are fed through our system, which performs PCA on them before passing the most salient ones to GP to generate classifiers. The classifiers give results of 100% accuracy on true positives and a false positive per image rating of just 1.5, which is better than prior work. Not only this, but our system can use GP as part of a feedback loop, to both select and help generate further features.
Article
The objectives of this study were (1) to ascertain the level of agreement between the Charlson Comorbidity Index (CCI) based on self-report vs. administrative records, and factors affecting agreement and (2) to compare the predictive validity of the two indices in a sample of older emergency department (ED) patients. The study was a secondary analysis of data from a randomized trial of an ED-based intervention. The self-report and administrative CCI were compared using the intraclass correlation coefficient (ICC). Factors examined for effect on agreement included health service utilization, age, and sex. The predictive validity of the indices was compared using subsequent health services utilization and functional decline as outcomes. Participants (n=520) were recruited at four university-affiliated Montreal hospitals. Eligibility criteria included 65 years of age or older, able to speak English or French, and discharged to the community. Agreement between the two sources was poor to fair (overall weighted ICC 0.43 [95% confidence interval [CI]: 0.40, 0.47]). The predictive validity was similar for the two indices (area under the receiver-operating characteristic curve 0.51-0.66, depending on the outcomes). Agreement between self-report and administrative comorbidity data is only poor to fair but both have comparable predictive validity.
Article
Full-text available
A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect differences in the accuracy of diagnostic techniques.
Article
The accuracy of a medical diagnostic test is often summarized in a receiver operating characteristic (ROC) curve. This paper puts forth an interpretation for each point on the ROC curve as being a conditional probability of a test result from a random diseased subject exceeding that from a random non diseased subject. This interpretation gives rise to new methods for making inference about ROC curves. It is shown that inference can be achieved with binary regression techniques applied to indicator variables constructed from pairs of test results, one component of the pair being from a diseased subject and the other from a non diseased subject. Within the generalized linear model (GLM) binary regression framework, ROC curves can be estimated, and we highlight a new semiparametric estimator. Covariate effects can also be evaluated with the GLM models. The methodology is applied to a pancreatic cancer dataset where we use the regression framework to compare two different serum biomarkers. Asymptotic distribution theory is developed to facilitate inference and to provide insight into factors influencing variability of estimated model parameters.
Article
Receiver operating characteristic (ROC) curves are frequently used to assess the usefulness of diagnostic markers. When several diagnostic markers are available, they can be combined by a best linear combination: that is, when the area under the ROC curve of this combination is maximized among all possible linear combinations. This maximal area is the generalized ROC criterion, which provides a measure of how effective the combination of the markers is. This criterion needs to be estimated from the data, and is usually evaluated against single markers. In the present paper, we provide confidence intervals for the generalized ROC criterion under the assumption of homogeneous covariance matrices, derive an approximation for the heterogeneous covariance matrices case, and evaluate the approximation via a simulation study. Finally, we present an illustrative example.
Article
The accuracy of a medical diagnostic test is typically summarized by the sensitivity and specificity when the test result is dichotomous. Receiver operating characteristic (ROC) curves are measures of test accuracy that are used when test results are continuous and are considered the analogs of sensitivity and specificity for continuous tests. ROC regression analysis allows one to evaluate effects of factors that may influence test accuracy. Such factors might include characteristics of study subjects or operating conditions for the test. Unfortunately, regression analysis methods for ROC curves are not well developed and methods that do exist have received little use to date. In this paper, we propose and compare three very different regression analysis methods. Two are modifications of methods previously proposed for radiology settings. The third is a special case of a general method recently proposed by us. The three approaches are compared with regard to settings in which they can be applied and distributional assumptions they require. In the setting where test results are normally distributed, we elucidate the correspondence between regression parameters in the different models. The methods are applied to simulated data and to data from a study of a new diagnostic test for hearing impairment. It is hoped that the presentation in this paper will both encourage the use of regression analysis for evaluating diagnostic tests and help guide the choice of the most appropriate regression analysis approach in applications.
Article
The performance of a diagnostic test is summarized by its receiver operating characteristic (ROC) curve. Under quite natural assumptions about the latent variable underlying the test, the ROC curve is convex. Empirical data on a test's performance often comes in the form of observed true positive and false positive relative frequencies under varying conditions. This paper describes a family of regression models for analyzing such data. The underlying ROC curves are specified by a quality parameter delta and a shape parameter mu and are guaranteed to be convex provided delta > 1. Both the position along the ROC curve and the quality parameter delta are modeled linearly with covariates at the level of the individual. The shape parameter mu enters the model through the link functions log(p mu) - log(1 - p mu) of a binomial regression and is estimated either by search or from an appropriate constructed variate. One simple application is to the meta-analysis of independent studies of the same diagnostic test, illustrated on some data of Moses, Shapiro, and Littenberg (1993). A second application, to so-called vigilance data, is given, where ROC curves differ across subjects and modeling of the position along the ROC curve is of primary interest.
Article
In recent years, there has been some interest in developing a functional form of the Lorenz curve and estimating it. This note suggests an alternative form of Lorenz curve that satisfies all the properties and can be estimated by the linear least squares method using its log-linear form. -after Author
Stober (610) 917-6541 E-mail: Paul_w_stober@gsk
  • Authors Paul
Authors Paul W. Stober (610) 917-6541 E-mail: Paul_w_stober@gsk.com
  • B Reiser
  • D Faraggi
Reiser, B., J. and D. Faraggi, Confidence Intervals for the Generalized ROC Criterion, Biometrics 53, pp. 644- 652, 1997
Estimation of the Lorenz Curve and Concentration Ratio
  • Shi-Tao Yeh
Yeh, Shi-Tao ; " Estimation of the Lorenz Curve and Concentration Ratio ", SUGI 18 Proceedings, pp. 873-877, May 1993