Georg Heinze

Georg Heinze
Medical University of Vienna | MedUni Vienna · Section for Clinical Biometrics

PhD

About

471
Publications
73,822
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
19,209
Citations
Citations since 2017
125 Research Items
11733 Citations
201720182019202020212022202305001,0001,5002,0002,5003,000
201720182019202020212022202305001,0001,5002,0002,5003,000
201720182019202020212022202305001,0001,5002,0002,5003,000
201720182019202020212022202305001,0001,5002,0002,5003,000
Introduction
Georg Heinze currently works at the Section for Clinical Biometrics, Medical University of Vienna. His primary research focuses on biostatistical regression modeling strategies for prediction and estimation of effects of exposures on outcomes, particularly when sample sizes are small or outcome events are rare. His secondary research focus is the re-use of health data for medical research, particularly when sample sizes are very large, as in nationwide studies on health insurance claims. He is also interested in providing statistical software for routine application of our methodological developments. He has collaborated as biostatistical partner in several EU-funded projects.
Additional affiliations
December 2015 - present
Medical University of Vienna
Position
  • Section head

Publications

Publications (471)
Article
Full-text available
Randomization is an effective design option to prevent bias from confounding in the evaluation of the causal effect of interventions on outcomes. However, in some cases, randomization is not possible, making subsequent adjustment for confounders essential to obtain valid results. Several methods exist to adjust for confounding, with multivariable m...
Article
Full-text available
Although new biostatistical methods are published at a very high rate, many of these developments are not trustworthy enough to be adopted by the scientific community. We propose a framework to think about how a piece of methodological work contributes to the evidence base for a method. Similar to the well‐known phases of clinical research in drug...
Article
Importance Kidney transplant is considered beneficial in terms of survival compared with continued dialysis for patients with kidney failure. However, randomized clinical trials are infeasible, and available evidence from cohort studies is at high risk of bias. Objective To compare restricted mean survival times (RMSTs) between patients who underw...
Preprint
Although the biostatistical scientific literature publishes new methods at a very high rate, many of these developments are not trustworthy enough to be adopted by the scientific community. We propose a framework to think about how a piece of methodological work contributes to the evidence base for a method. Similarly to the well-known phases of cl...
Article
Full-text available
Background Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory, which assumes a fixed set of covariates in the model. This leads to over-optimistic selection and replicability issues. Methods We compared proposals...
Article
Full-text available
Background Some capability dimensions may be more important than others in determining someone’s well-being, and these preferences might be dependent on ill-health experience. This study aimed to explore the relative preference weights of the 16 items of the German language version of the OxCAP-MH (Oxford Capability questionnaire-Mental Health) cap...
Article
Full-text available
There is an increasing interest in machine learning (ML) algorithms for predicting patient outcomes, as these methods are designed to automatically discover complex data patterns. For example, the random forest (RF) algorithm is designed to identify relevant predictor variables out of a large set of candidates. In addition, researchers may also use...
Article
Full-text available
Background In binary logistic regression data are ‘separable’ if there exists a linear combination of explanatory variables which perfectly predicts the observed outcome, leading to non-existence of some of the maximum likelihood coefficient estimates. A popular solution to obtain finite estimates even with separable data is Firth’s logistic regres...
Article
Full-text available
The medical field has seen a rapid increase in the development of artificial intelligence (AI)-based prediction models. With the introduction of such AI-based prediction model tools and software in cardiovascular patient care, the cardiovascular researcher and healthcare professional are challenged to understand the opportunities as well as the lim...
Article
Full-text available
A common view in epidemiology is that automated confounder selection methods, such as backward elimination, should be avoided as they can lead to biased effect estimates and underestimation of their variance. Nevertheless, backward elimination remains regularly applied. We investigated if and under which conditions causal effect estimation in obser...
Article
Full-text available
Background Recent advances in biotechnology enable the acquisition of high-dimensional data on individuals, posing challenges for prediction models which traditionally use covariates such as clinical patient characteristics. Alternative forms of covariate representations for the features derived from these modern data modalities should be considere...
Preprint
Full-text available
Background Some capability dimensions may be more important than others in determining someone’s well-being, and these preferences might be dependent on ill-health experience. This study aimed to explore the relative preference weights of the 16 items of the German language version of the OxCAP-MH (Oxford Capability questionnaire-Mental Health) cap...
Article
In this commentary, we discuss the analysis of trajectories of pulse wave velocity in a longitudinal cohort study of children with chronic kidney disease (the Cardiovascular Comorbidity in Children with Chronic Kidney Disease – Transplantation study). We revisit the analysis made by the study authors and unravel some additional limitations. We also...
Preprint
Full-text available
Background In binary logistic regression data are ‘separable’ if there exists a linear combination of explanatory variables which perfectly predicts the observed outcome, leading to non-existence of some of the maximum likelihood coefficient estimates. A popular solution to obtain finite estimates even with separable data is Firth’s logistic regres...
Article
Full-text available
Objective: To identify and critically appraise risk prediction models for living donor solid organ transplant counselling. Study design and setting: We systematically reviewed articles describing the development or validation of prognostic risk prediction models about living donor solid organ (kidney and liver) transplantation indexed in Medline...
Article
Objective In a previous phase II trial, we showed that topical imiquimod (IMQ) therapy is an efficacious treatment for high-grade squamous intraepithelial lesion (HSIL). Aim of the present study was to investigate the non-inferiority of a 16-week topical, self-applied IMQ therapy compared to large loop excision of the transformation zone (LLETZ) in...
Article
Full-text available
Although regression models play a central role in the analysis of medical research projects, there still exist many misconceptions on various aspects of modeling leading to faulty analyses. Indeed, the rapidly developing statistical methodology and its recent advances in regression modeling do not seem to be adequately reflected in many medical pub...
Article
Full-text available
Poisson regression can be challenging with sparse data, in particular with certain data constellations where maximum likelihood estimates of regression coefficients do not exist. This paper provides a comprehensive evaluation of methods that give finite regression coefficients when maximum likelihood estimates do not exist, including Firth’s genera...
Article
Full-text available
Background While machine learning (ML) algorithms may predict cardiovascular outcomes more accurately than statistical models, their result is usually not representable by a transparent formula. Hence, it is often unclear how specific values of predictors lead to the predictions. We aimed to demonstrate with graphical tools how predictor-risk relat...
Article
Full-text available
Background Chronic kidney disease (CKD) is a well-established complication in people with diabetes mellitus. Roughly one quarter of prevalent patients with diabetes exhibit a CKD stage of 3 or higher and the individual course of progression is highly variable. Therefore, there is a clear need to identify patients at high risk for fast progression a...
Article
Full-text available
Hospital length of stay (LOS) is an important clinical and economic outcome and knowing its predictors could lead to better planning of resources needed during hospitalization. This analysis sought to identify structure, patient, and nutrition-related predictors of LOS available at the time of admission in the global nutritionDay dataset and to ana...
Article
Introduction: Patients with unprovoked venous thromboembolism (VTE) have a high recurrence risk, and, according to guidelines, should receive extended oral anticoagulation (OAC). OAC prevents recurrence in most patients but may cause major bleeding. Patients with a low recurrence risk could therefore benefit from limited OAC duration. The Vienna pr...
Article
Full-text available
Background For finite samples with binary outcomes penalized logistic regression such as ridge logistic regression has the potential of achieving smaller mean squared errors (MSE) of coefficients and predictions than maximum likelihood estimation. There is evidence, however, that ridge logistic regression can result in highly variable calibration s...
Article
Full-text available
Background Statistical model building requires selection of variables for a model depending on the model’s aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data...
Article
Full-text available
Health care claims databases maintained by social insurance institutions provide rich and sometimes easily accessible data sources for epidemiological research. Interpreting the registered claims, for example, drug prescriptions, as proxies for the condition of interest, for example, diabetes, they allow for nationwide prevalence estimation. We ill...
Article
Background and Aims Kidney transplantation is considered to be the optimal treatment strategy for eligible end stage renal disease patients. However, the body of evidence to underpin the anticipated survival advantage for kidney transplant recipients is weak, as random treatment allocation to either kidney transplantation or remaining on dialysis i...
Article
Full-text available
Regression models have been in use for decades to explore and quantify the association between a dependent response and several independent variables in environmental sciences, epidemiology and public health. However, researchers often encounter situations in which some independent variables exhibit high bivariate correlation, or may even be collin...
Article
Full-text available
Background The use of potentially inappropriate medication (PIM) in population of older adults may result in adverse drug events (ADE) already after short term exposure, especially when it is prescribed to patients with chronic kidney disease (CKD). In order to limit ADE in the treatment of older adults PIM lists have been constructed as a source o...
Article
Full-text available
Background: The induction of donor-specific immunological tolerance could improve outcome after kidney transplantation. However, no tolerance protocol is available for routine clinical use. Chimerism-based regimens hold promise, but their widespread application is impeded in part by unresolved safety issues. This study tests the hypothesis that the...
Preprint
For finite samples with binary outcomes penalized logistic regression such as ridge logistic regression (RR) has the potential of achieving smaller mean squared errors (MSE) of coefficients and predictions than maximum likelihood estimation. There is evidence, however, that RR is sensitive to small or sparse data situations, yielding poor performan...
Preprint
Penalized logistic regression methods are frequently used to investigate the relationship between a binary outcome and a set of explanatory variables. The model performance can be assessed by measures such as the concordance statistic (c-statistic), the discrimination slope and the Brier score. Often, data resampling techniques, e.g. crossvalidatio...
Preprint
Firth-type logistic regression has become a standard approach for the analysis of binary outcomes with small samples. Whereas it reduces the bias in maximum likelihood estimates of coefficients, bias towards 1/2 is introduced in the predicted probabilities. The stronger the imbalance of the outcome, the more severe is the bias in the predicted prob...
Article
Full-text available
In the last decades, statistical methodology has developed rapidly, in particular in the field of regression modeling. Multivariable regression models are applied in almost all medical research projects. Therefore, the potential impact of statistical misconceptions within this field can be enormous Indeed, the current theoretical statistical knowle...
Article
Full-text available
Background: How to select variables and identify functional forms for continuous variables is a key concern when creating a multivariable model. Ad hoc 'traditional' approaches to variable selection have been in use for at least 50 years. Similarly, methods for determining functional forms for continuous variables were first suggested many years a...
Preprint
Separation in logistic regression is a common problem causing failure of the iterative estimation process when finding maximum likelihood estimates. Firth's correction (FC) was proposed as a solution, providing estimates also in presence of separation. In this paper we evaluate whether ridge regression (RR) could be considered instead, specifically...
Poster
Background Secondary prevention after acute coronary syndrome (ACS) mirrors a key position in the reduction of morbidity and mortality in this highly vulnerable patient population. Especially cardiac rehabilitation proved to be one of the most beneficial therapeutic approaches for the reduction of re-events and overall modification of cardiovascula...
Poster
Background Secondary prevention after acute coronary syndrome (ACS) mirrors a key position in the reduction of morbidity and mortality in this highly vulnerable patient population. Especially lipid lowering therapy – via high-intensity statins (atorvastatin and rosuvastatin) – proved to be one of the most beneficial therapeutic approaches for the r...
Poster
Background Secondary prevention after acute coronary syndrome (ACS) mirrors a key position in the reduction of morbidity and mortality in this highly vulnerable patient population. Especially dual anti-platelet therapy (DAPT) – including aspirin plus a P2Y12 inhibitor – proved to be one of the most beneficial therapeutic approaches for the reductio...
Article
Full-text available
Statistical models are often fitted to obtain a concise description of the association of an outcome variable with some covariates. Even if background knowledge is available to guide preselection of covariates, stepwise variable selection is commonly applied to remove irrelevant ones. This practice may introduce additional variability and selection...
Article
Objective: Postscreening colorectal cancer (PSCRC) after screening colonoscopy is associated with endoscopists' performance and characteristics of resected lesions. Prior studies have shown that adenoma detection rate (ADR) is a decisive factor for PSCRC, but correlations with other parameters need further analysis and ADR may change over time. D...
Poster
Full-text available
Causal inference from observational studies can be challenging with a rare outcome event and many potential confounding variables. The probability of an individual to receive the treatment given the patient’s information, known as the propensity score, can be used in the process of matching or weighting the observational data to combat the inherent...
Poster
Full-text available
Release of medical data is important in the scientific world, but it compromises patient privacy, which is a major concern. Releasing perturbed versions of the original data sets might preserve some degree of patient privacy, but more privacy leads to less utility. With proteomic biomarker data, perturbation is complicated by zero-inflated and non-...
Article
Full-text available
Objectives Risk prediction in implant dentistry presents specific challenges including the dependence of observations from patients with multiple implants and rare outcome events. The aim of this study was to use advanced statistical methods based on penalized regression to assess risk factors in implant dentistry. Material and Methods We conducte...
Article
Background To compare open repair (OR) with EVAR for the management of ruptured infrarenal abdominal aortic aneurysms (RAAA) in a cohort study over a time period of 15 years with inverse probability of treatment weights. Material and Methods From 2000/01 through 2015/12 136 patients were treated for RAAA, 98 (72.1%) underwent OR, 38 (27.9%) were t...
Article
Full-text available
Equations predicting the risk of occurrence of cardiovascular disease (CVD) are used in primary care to identify high-risk individuals among the general population. To improve the predictive performance of such equations, we updated the Framingham general CVD 1991 and 2008 equations and the Pooled Cohort equations for atherosclerotic CVD within fiv...
Preprint
Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory which assumes a fixed set of covariates in the model. We review two interpretations of inference after selection: the full model view, in which the parameters of...
Article
Full-text available
Objective To review and critically appraise published and preprint reports of prediction models for diagnosing coronavirus disease 2019 (covid-19) in patients with suspected infection, for prognosis of patients with covid-19, and for detecting people in the general population at risk of being admitted to hospital for covid-19 pneumonia. Design Rap...
Article
Full-text available
Generic medications offer substantial potential cost savings to health systems compared to their branded counterparts. In Europe and the US, they are only approved if they are bioequivalent to the respective originator product. Nevertheless, the lack of clinical outcomes is sometimes used as the reason for hesitancy in prescribing generics. We perf...
Preprint
Full-text available
Objective: To review and critically appraise published and preprint reports of models that aim to predict either (i) presence of existing COVID-19 infection, or (ii) future complications in individuals already diagnosed with COVID-19. Any models to identify subjects at risk for COVID-19 in the general population were also included. Design: Rapid sy...
Article
Full-text available
The recent discussion on the reproducibility of scientific results is particularly relevant for preclinical research with animal models. Within certain areas of preclinical research, there exists the tradition of repeating an experiment at least twice to demonstrate replicability. If the results of the first two experiments do not agree, then the e...
Article
Full-text available
Background: Although separate prediction models for donors and recipients were previously published, we identified a need to predict outcomes of donor/recipient simultaneously, as they are clearly not independent of each other. Methods: We used characteristics from transplantations performed at the Oslo University Hospital from 1854 live donors,...
Article
Full-text available
Purpose: Overactive bladder (OAB) syndrome has severe effects on quality of life. Certain drugs are known risk factors for OAB but have not been investigated in a population-wide cohort. The objective of this study was to investigate the role of prescription drugs in the etiology of the OAB. Methods: Retrospective cohort study using a population...
Article
Full-text available
The parameters of logistic regression models are usually obtained by the method of maximum likelihood (ML). However, in analyses of small data sets or data sets with unbalanced outcomes or exposures, ML parameter estimates may not exist. This situation has been termed ‘separation’ as the two outcome groups are separated by the values of a covariate...
Conference Paper
PURPOSE: Although the discovery of prions was rewarded with a Nobel Prize, their existence was only attributed to a limited number of diseases. Recent evidence suggests that their role has been underestimated and several other proteins carry prion-like properties, like ß-amyloid, and most recently p53. High-grade serous ovarian cancers (HGSOC) harb...
Preprint
The parameters of logistic regression models are usually obtained by the method of maximum likelihood (ML). However, in analyses of small data sets or data sets with unbalanced outcomes or exposures, ML parameter estimates may not exist. This situation has been termed “separation” as the two outcome groups are separated by the values of a covariate...
Article
Full-text available
Chronic kidney disease (CKD) may progress to end-stage renal disease (ESRD) at different pace. Early markers of disease progression could facilitate and improve patient management. However, conventional blood and urine chemistry have proven unable to predict the progression of disease at early stages. Therefore, we performed untargeted plasma pepti...
Article
Full-text available
Most research in transplant medicine includes statistical analysis of observed data. Too often authors solely rely on p‐values derived by statistical tests to answer their research questions. A p‐value smaller than 0.05 is typically used to declare ‘statistical significance’ and hence, ‘proves’ that, e.g., an intervention has an effect on the outco...
Article
Objectives: The aim of this study was to determine stroke rates in patients who did or did not undergo routine computed tomography angiography (CTA) aortic imaging before isolated coronary artery bypass grafting (CABG). Methods: We conducted a retrospective analysis of a prospectively maintained single-centre registry. Between 2009 and 2016, a t...
Article
Clinical risk factors explain only a fraction of the variability of estimated glomerular filtration rate (eGFR) decline in people with type 2 diabetes. Cross-omics technologies by virtue of a wide spectrum screening of plasma samples have the potential to identify biomarkers for the refinement of prognosis in addition to clinical variables. Here we...
Preprint
Full-text available
How to select variables and identify functional forms for continuous variables is a key concern when creating a multivariable model. Ad hoc 'traditional' approaches to variable selection have been in use for at least 50 years. Similarly, methods for determining functional forms for continuous variables were first suggested many years ago. More rece...
Article
Full-text available
The sphingolipid and lysophosphatidate regulatory networks impact diverse mechanisms attributed to cancer cells and the tumor immune microenvironment. Deciphering the complexity demands implementation of a holistic approach combined with higher-resolution techniques. We implemented a multi-modular integrative approach consolidating the latest accom...