Stephan Dreiseitl

Fachhochschule Oberösterreich, Wels, Upper Austria, Austria

Are you Stephan Dreiseitl?

Claim your profile

Publications (39)39.99 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: We evaluated the accuracy of diagnoses made from pictures taken with the built-in cameras of mobile phones in a 'real-life' clinical setting. A total of 263 patients took part, who photographed their own lesions where possible, and provided clinical information via a questionnaire. After the teledermatology procedure, each patient was examined face-to-face and a gold standard diagnosis was made. The telemedicine data and pictures were diagnosed by 15 dermatologists. The 299 cases contained 1-22 clinical images each (median 3). Nine dermatologists finished all the cases and the remaining six completed some of them, thus providing 2893 decisions. Overall, 61% of all cases were rated as possible to diagnose and of those, 80% were correct in comparison with the face-to-face diagnosis. Image quality was evaluated and the median was 5 on a 10-point scale. There was a significant correlation between the correct diagnosis and the quality of the photographs taken (P < 0.001). In nearly two-thirds of all cases, a teledermatology diagnosis was possible; however, there was insufficient information to make a telemedicine diagnosis in about one-third of the cases. If applied carefully, mobile phones could be a powerful tool for people to optimize their health care status.
    Journal of Telemedicine and Telecare 06/2013; 19(4):213-8. DOI:10.1177/1357633X13490890 · 1.74 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: OBJECTIVE To develop a birth weight (BW), gestational age (GA), and postnatal-weight gain retinopathy of prematurity (ROP) prediction model in a cohort of infants meeting current screening guidelines. METHODS Multivariate logistic regression was applied retrospectively to data from infants born with BW less than 1501 g or GA of 30 weeks or less at a single Philadelphia hospital between January 1, 2004, and December 31, 2009. In the model, BW, GA, and daily weight gain rate were used repeatedly each week to predict risk of Early Treatment of Retinopathy of Prematurity type 1 or 2 ROP. If risk was above a cut-point level, examinations would be indicated. RESULTS Of 524 infants, 20 (4%) had type 1 ROP and received laser treatment; 28 (5%) had type 2 ROP. The model (Children's Hospital of Philadelphia [CHOP]) accurately predicted all infants with type 1 ROP; missed 1 infant with type 2 ROP, who did not require laser treatment; and would have reduced the number of infants requiring examinations by 49%. Raising the cut point to miss one type 1 ROP case would have reduced the need for examinations by 79%. Using daily weight measurements to calculate weight gain rate resulted in slightly higher examination reduction than weekly measurements. CONCLUSIONS The BW-GA-weight gain CHOP ROP model demonstrated accurate ROP risk assessment and a large reduction in the number of ROP examinations compared with current screening guidelines. As a simple logistic equation, it can be calculated by hand or represented as a nomogram for easy clinical use. However, larger studies are needed to achieve a highly precise estimate of sensitivity prior to clinical application.
    Archives of ophthalmology 12/2012; 130(12):1560-5. DOI:10.1001/archophthalmol.2012.2524 · 4.49 Impact Factor
  • M. Osl · M. Netzer · S. Dreiseitl · C. Baumgartner
    [Show abstract] [Hide abstract]
    ABSTRACT: This chapter provides an overview of emerging bioinformatics methods for the biomarker discovery process and medical decision support. It introduces study design consideration and bioanalytic concepts for generating biomedical data, followed by various data mining and information retrieval procedures such as feature selection, classification as well as statistical and clinical validation. The reviewed methods are illustrated by real examples from preclinical and clinical studies, and the application in medical decision making is discussed. This chapter is anticipated to address to those with a bioinformatics background as well as biomedical researchers who are interested in the application of computational methods in biomarker discovery and medical decision making.
    Computational Medicine, 01/2012: pages 173-184; , ISBN: 978-3-7091-0946-5
  • Stephan Dreiseitl · Melanie Osl
    [Show abstract] [Hide abstract]
    ABSTRACT: The accurate assessment of the calibration of classification models is severely limited by the fact that there is no easily available gold standard against which to compare a model's outputs. The usual procedures group expected and observed probabilities, and then perform a χ(2) goodness-of-fit test. We propose an entirely new approach to calibration testing that can be derived directly from the first principles of statistical hypothesis testing. The null hypothesis is that the model outputs are correct, i.e., that they are good estimates of the true unknown class membership probabilities. Our test calculates a p-value by checking how (im)probable the observed class labels are under the null hypothesis. We demonstrate by experiments that our proposed test performs comparable to, and sometimes even better than, the Hosmer-Lemeshow goodness-of-fit test, the de facto standard in calibration assessment.
    AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2012; 2012:164-9.
  • Stephan Dreiseitl · Maja Pivec · Michael Binder
    [Show abstract] [Hide abstract]
    ABSTRACT: To use computer-based eye tracking technology to record and evaluate examination characteristics of the diagnosis of pigmented skin lesions. 16 study participants with varying levels of diagnostic expertise (little, intermediate, superior) were recorded while diagnosing a series of 28 digital images of pigmented skin lesions, obtained by non-invasive digital dermatoscopy, on a computer screen. Eye tracking hardware recorded the gaze track and fixations of the physicians while they examined the lesion images. Analysis of variance was used to test for differences in examination characteristics between physicians grouped according to expertise. There were no significant differences between physicians with little and intermediate levels of expertise in terms of average time until diagnosis (6.61 vs. 6.19s), gaze track length (6.65 vs. 6.15 kilopixels), number of fixations (23.1 vs. 19.1), and time in fixations (4.91 vs. 4.17s). The experts were significantly different with 3.17s time until diagnosis, 4.53 kilopixels gaze track length, 9.9 fixations, and 1.74s in fixations, respectively. Differentiation between benign and malignant lesions had no effect on examination measurements. The results show that experience level has a significant impact on the way in which lesion images are examined. This finding can be used to construct decision support systems that employ important diagnostic features identified by experts, and to optimize teaching for less experienced physicians.
    Artificial intelligence in medicine 12/2011; 54(3):201-5. DOI:10.1016/j.artmed.2011.11.004 · 1.36 Impact Factor
  • Melanie Osl · Stephan Dreiseitl
    [Show abstract] [Hide abstract]
    ABSTRACT: Acute myocardial infarction is one of the most common cardiovascular diseases in the Western world. Fortunately, not all myocardial infractions are fatal. By early diagnosis of acute myocardial infarction based on symptoms at a patient’s presentation in the emergency department, the number of deaths may be further reduced, as life-saving actions can be taken sooner. In this paper, we investigate the application of kernel-based methods to this problem, i.e. we evaluate the performance of support vector machines and kernel logistic regression models and compare these two methods to logistic regression models in terms of discrimination and calibration. The results show that kernel-based methods have higher discriminatory power for early diagnosis of acute myocardial infarction than logistic regression models and that kernel logistic regression models have superior calibration in comparison to logistic regression models and support vector machines.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: To develop an efficient clinical prediction model that includes postnatal weight gain to identify infants at risk of developing severe retinopathy of prematurity (ROP). Under current birth weight (BW) and gestational age (GA) screening criteria, <5% of infants examined in countries with advanced neonatal care require treatment. This study was a secondary analysis of prospective data from the Premature Infants in Need of Transfusion Study, which enrolled 451 infants with a BW < 1000 g at 10 centers. There were 367 infants who remained after excluding deaths (82) and missing weights (2). Multivariate logistic regression was used to predict severe ROP (stage 3 or treatment). Median BW was 800 g (445-995). There were 67 (18.3%) infants who had severe ROP. The model included GA, BW, and daily weight gain rate. Run weekly, an alarm that indicated need for eye examinations occurred when the predicted probability of severe ROP was >0.085. This identified 66 of 67 severe ROP infants (sensitivity of 99% [95% confidence interval: 94%-100%]), and all 33 infants requiring treatment. Median alarm-to-outcome time was 10.8 weeks (range: 1.9-17.6). There were 110 (30%) infants who had no alarm. Nomograms were developed to determine risk of severe ROP by BW, GA, and postnatal weight gain. In a high-risk cohort, a BW-GA-weight-gain model could have reduced the need for examinations by 30%, while still identifying all infants requiring laser surgery. Additional studies are required to determine whether including larger-BW, lower-risk infants would reduce examinations further and to validate the prediction model and nomograms before clinical use.
    PEDIATRICS 02/2011; 127(3):e607-14. DOI:10.1542/peds.2010-2240 · 5.30 Impact Factor
  • Stephan Dreiseitl · Melanie Osl
    Computer Aided Systems Theory - EUROCAST 2011 - 13th International Conference, Las Palmas de Gran Canaria, Spain, February 6-11, 2011, Revised Selected Papers, Part I; 01/2011
  • Stephan Dreiseitl · Melanie Osl · Christian Baumgartner · Staal Vinterbo
    [Show abstract] [Hide abstract]
    ABSTRACT: To evaluate and compare the performance of different rule-ranking algorithms for rule-based classifiers on biomedical datasets. Empirical evaluation of five rule ranking algorithms on two biomedical datasets, with performance evaluation based on ROC analysis and 5 × 2 cross-validation. On a lung cancer dataset, the area under the ROC curve (AUC) of, on average, 14267.1 rules was 0.862. Multi-rule ranking found 13.3 rules with an AUC of 0.852. Four single-rule ranking algorithms, using the same number of rules, achieved average AUC values of 0.830, 0.823, 0.823, and 0.822, respectively. On a prostate cancer dataset, an average of 339265.3 rules had an AUC of 0.934, while 9.4 rules obtained from multi-rule and single-rule rankings had average AUCs of 0.932, 0.926, 0.925, 0.902 and 0.902, respectively. Multi-variate rule ranking performs better than the single-rule ranking algorithms. Both single-rule and multi-rule methods are able to substantially reduce the number of rules while keeping classification performance at a level comparable to the full rule set.
    Artificial intelligence in medicine 05/2010; 50(3):175-80. DOI:10.1016/j.artmed.2010.03.005 · 1.36 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The quality of predictive modeling in biomedicine depends on the amount of data available for model building. To study the effect of combining microarray data sets on feature selection and predictive modeling performance. Empirical evaluation of stability of feature selection and discriminatory power of classifiers using three previously published gene expression data sets, analyzed both individually and in combination. Feature selection was not robust for the individual as well as for the combined data sets. The classification performance of models built on individual and combined data sets was heavily dependent on the data set from which the features were extracted. We identified volatility of feature selection as contributing factor to some of the problems faced by predictive modeling using microarray data.
    AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2010; 2010:567-71.
  • Source
    Stephan Dreiseitl · Melanie Osl · Christian Scheibböck · Michael Binder
    [Show abstract] [Hide abstract]
    ABSTRACT: Medical diagnosis and prognosis using machine learning methods is usually represented as a supervised classification problem, where a model is built to distinguish "normal" from "abnormal" cases. If cases are available from only one class, this approach is not feasible. To evaluate the performance of classification via outlier detection by one-class support vector machines (SVMs) as a means of identifying abnormal cases in the domain of melanoma prognosis. Empirical evaluation of one-class SVMs on a data set for predicting the presence or absence of metastases in melanoma patients, and comparison with regular SVMs and artificial neural networks. One-class SVMs achieve an area under the ROC curve (AUC) of 0.71; two-class algorithms achieve AUCs between 0.5 and 0.84, depending on the available number of cases from the minority class. One-class SVMs offer a viable alternative to two-class classification algorithms if class distribution is heavily imbalanced.
    AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2010; 2010:172-6.
  • Source
    M Osl · S Dreiseitl · F Cerqueira · M Netzer · B Pfeifer · C Baumgartner
    [Show abstract] [Hide abstract]
    ABSTRACT: The identification of a set of relevant but not redundant features is an important first step in building predictive and diagnostic models from biomedical data sets. Most commonly, individual features are ranked in terms of a quality criterion, out of which the best (first) k features are selected. However, feature ranking methods do not sufficiently account for interactions and correlations between the features. Thus, redundancy is likely to be encountered in the selected features. We present a new algorithm, termed Redundancy Demoting (RD), that takes an arbitrary feature ranking as input, and improves this ranking by identifying redundant features and demoting them to positions in the ranking in which they are not redundant. Redundant features are those that are correlated with other features and not relevant in the sense that they do not improve the discriminatory ability of a set of features. Experiments on two cancer data sets, one melanoma image data set and one lung cancer microarray data set, show that our algorithm greatly improves the feature rankings provided by the methods information gain, ReliefF and Student's t-test in terms of predictive power.
    Journal of Biomedical Informatics 06/2009; 42(4):721-5. DOI:10.1016/j.jbi.2009.05.006 · 2.48 Impact Factor
  • Stephan Dreiseitl · Michael Binder · Krispin Hable · Harald Kittler
    [Show abstract] [Hide abstract]
    ABSTRACT: The aim of this study was to evaluate the accuracy of a computer-based system for the automated diagnosis of melanoma in the hands of nonexpert physicians. We performed a prospective comparison between nonexperts using computer assistance and experts without assistance in the setting of a tertiary referral center at a University hospital. Between February and November 2004 we enrolled 511 consecutive patients. Each patient was examined by two nonexpert physicians with low to moderate diagnostic skills who were allowed to use a neural network-based diagnostic system at their own discretion. Every patient was also examined by an expert dermatologist using standard dermatoscopy equipment. The nonexpert physicians used the automatic diagnostic system in 3827 pigmented skin lesions. In their hands, the system achieved a sensitivity of 72% and a specificity of 82%. The sensitivity was significantly lower than that of the expert physician (72 vs. 96%, P = 0.001), whereas the specificity was significantly higher (82 vs. 72%, P<0.01). Three melanomas were missed because the physicians who operated the system did not choose them for examination. The system as a stand-alone device had an average discriminatory power of 0.87, as measured by the area under the receiver operating characteristic curve, with optimal sensitivities and specificities of 75 and 84%, respectively. The diagnostic accuracy achieved in this clinical trial was lower than that achieved in a previous experimental trial of the same system. In total, the performance of a decision-support system for melanoma diagnosis under real-life conditions is lower than that expected from experimental data and depends upon the physicians who are using the system.
    Melanoma research 04/2009; 19(3):180-4. DOI:10.1097/CMR.0b013e32832a1e41 · 2.10 Impact Factor
  • Christian Scheibböck · Stephan Dreiseitl · Michael Binder
    Proceedings of the Second International Conference on Health Informatics, HEALTHINF 2009, Porto, Portugal, January 14-17, 2009; 01/2009
  • Stephan Dreiseitl · Melanie Osl
    [Show abstract] [Hide abstract]
    ABSTRACT: The process of feature selection is an important first step in building machine learning models. Feature selection algorithms can be grouped into wrappers and filters; the former use machine learning models to evaluate feature sets, the latter use other criteria to evaluate features individually. We present a new approach to feature selection that combines advantages of both wrapper as well as filter approaches, by using logistic regression and the area under the ROC curve (AUC) to evaluate pairs of features. After choosing as starting feature the one with the highest individual discriminatory power, we incrementally rank features by choosing as next feature the one that achieves the highest AUC in combination with an already chosen feature. To evaluate our approach, we compared it to standard filter and wrapper algorithms. Using two data sets from the biomedical domain, we are able to demonstrate that the performance of our approach exceeds that of filter methods, while being comparable to wrapper methods at smaller computational cost.
    Computer Aided Systems Theory - EUROCAST 2009, 12th International Conference, Las Palmas de Gran Canaria, Spain, February 15-20, 2009, Revised Selected Papers; 01/2009
  • M. Osl · C. Baumgartner · B. Tilg · S. Dreiseitl
    [Show abstract] [Hide abstract]
    ABSTRACT: Classifiers based on parametric or non-parametric learning methods have different advantages and disadvantages. To take advantage of the strengths of both methods, we propose an algorithm that combines a parametric model (logistic regression) with a non-parametric classification method (k-nearest neighbors). This combination is based on a measure of appropriateness that uses a heuristic to decide which of the two components should contribute more to the final classification output. We measure the performance of this combination method on two data sets (one from medical informatics, and one consisting of simulated data) in terms of areas under the ROC curves (AUCs). We are able to demonstrate that our method of combining classifiers exceeds the performance of both individual classifiers taken separately.
    Broadband Communications, Information Technology & Biomedical Applications, 2008 Third International Conference on; 12/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Prostate cancer is the most prevalent tumor in males and its incidence is expected to increase as the population ages. Prostate cancer is treatable by excision if detected at an early enough stage. The challenges of early diagnosis require the discovery of novel biomarkers and tools for prostate cancer management. We developed a novel feature selection algorithm termed as associative voting (AV) for identifying biomarker candidates in prostate cancer data measured via targeted metabolite profiling MS/MS analysis. We benchmarked our algorithm against two standard entropy-based and correlation-based feature selection methods [Information Gain (IG) and ReliefF (RF)] and observed that, on a variety of classification tasks in prostate cancer diagnosis, our algorithm identified subsets of biomarker candidates that are both smaller and show higher discriminatory power than the subsets identified by IG and RF. A literature study confirms that the highest ranked biomarker candidates identified by AV have independently been identified as important factors in prostate cancer development. The algorithm can be downloaded from the following http://biomed.umit.at/page.cfm?pageid=516.
    Bioinformatics 10/2008; 24(24):2908-14. DOI:10.1093/bioinformatics/btn506 · 4.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Objective: To improve the calibration of logistic regression (LR) estimates using local information. Background: Individualized risk assessment tools are increasingly being utilized. External validation of these tools often reveals poor model calibration. Methods: We combine a clustering algorithm with an LR model to produce probability estimates that are close to the true probabilities for a particular case. The new method is compared to a standard LR model in terms of calibration, as measured by the sum of absolute differences (SAD) between model estimates and true probabilities, and discrimination, as measured by area under the ROC curve (AUC). Results: We evaluate the new method on two synthetic data sets. SADs are significantly lower (p < 0.0001) in both data sets, and AUCs are significantly higher in one data set (p < 0.01). Conclusion: The results suggest that the proposed method may be useful to improve the calibration of LR models.
    AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 01/2008;
  • Source
    Stephan Dreiseitl · Michael Binder · Staal Vinterbo · Harald Kittler
    [Show abstract] [Hide abstract]
    ABSTRACT: The work reported in this paper investigates the use of a decision-support tool for the diagnosis of pigmented skin lesions in a real-world clinical trial with 511 patients and 3827 lesion evaluations. We analyzed a number of outcomes of the trial, such as direct comparison of system performance in laboratory and clinical setting, the performance of physicians using the system compared to a control dermatologist without the system, and repeatability of system recommendations. The results show that system performance was significantly less in the real-world setting compared to the laboratory setting (c-index of 0.87 vs. 0.94, p = 0.01). Dermatologists using the system achieved a combined sensitivity of 85% and combined specificity of 95%. We also show that the process of acquiring lesion images using digital dermoscopy devices needs to be standardized before sufficiently high repeatability of measurements can be assured.
    AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium 02/2007;
  • International Conference on Bioinformatics & Computational Biology, BIOCOMP 2007, Volume II, June 25-28, 2007, Las Vegas Nevada, USA; 01/2007