About
471
Publications
73,822
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
19,209
Citations
Citations since 2017
Introduction
Georg Heinze currently works at the Section for Clinical Biometrics, Medical University of Vienna.
His primary research focuses on biostatistical regression modeling strategies for prediction and estimation of effects of exposures on outcomes, particularly when sample sizes are small or outcome events are rare. His secondary research focus is the re-use of health data for medical research, particularly when sample sizes are very large, as in nationwide studies on health insurance claims. He is also interested in providing statistical software for routine application of our methodological developments. He has collaborated as biostatistical partner in several EU-funded projects.
Additional affiliations
December 2015 - present
Publications
Publications (471)
Randomization is an effective design option to prevent bias from confounding in the evaluation of the causal effect of interventions on outcomes. However, in some cases, randomization is not possible, making subsequent adjustment for confounders essential to obtain valid results. Several methods exist to adjust for confounding, with multivariable m...
Although new biostatistical methods are published at a very high rate, many of these developments are not trustworthy enough to be adopted by the scientific community. We propose a framework to think about how a piece of methodological work contributes to the evidence base for a method. Similar to the well‐known phases of clinical research in drug...
Importance
Kidney transplant is considered beneficial in terms of survival compared with continued dialysis for patients with kidney failure. However, randomized clinical trials are infeasible, and available evidence from cohort studies is at high risk of bias.
Objective
To compare restricted mean survival times (RMSTs) between patients who underw...
Although the biostatistical scientific literature publishes new methods at a very high rate, many of these developments are not trustworthy enough to be adopted by the scientific community. We propose a framework to think about how a piece of methodological work contributes to the evidence base for a method. Similarly to the well-known phases of cl...
Background
Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory, which assumes a fixed set of covariates in the model. This leads to over-optimistic selection and replicability issues.
Methods
We compared proposals...
Background
Some capability dimensions may be more important than others in determining someone’s well-being, and these preferences might be dependent on ill-health experience. This study aimed to explore the relative preference weights of the 16 items of the German language version of the OxCAP-MH (Oxford Capability questionnaire-Mental Health) cap...
There is an increasing interest in machine learning (ML) algorithms for predicting patient outcomes, as these methods are designed to automatically discover complex data patterns. For example, the random forest (RF) algorithm is designed to identify relevant predictor variables out of a large set of candidates. In addition, researchers may also use...
Background
In binary logistic regression data are ‘separable’ if there exists a linear combination of explanatory variables which perfectly predicts the observed outcome, leading to non-existence of some of the maximum likelihood coefficient estimates. A popular solution to obtain finite estimates even with separable data is Firth’s logistic regres...
The medical field has seen a rapid increase in the development of artificial intelligence (AI)-based prediction models. With the introduction of such AI-based prediction model tools and software in cardiovascular patient care, the cardiovascular researcher and healthcare professional are challenged to understand the opportunities as well as the lim...
A common view in epidemiology is that automated confounder selection methods, such as backward elimination, should be avoided as they can lead to biased effect estimates and underestimation of their variance. Nevertheless, backward elimination remains regularly applied. We investigated if and under which conditions causal effect estimation in obser...
Background
Recent advances in biotechnology enable the acquisition of high-dimensional data on individuals, posing challenges for prediction models which traditionally use covariates such as clinical patient characteristics. Alternative forms of covariate representations for the features derived from these modern data modalities should be considere...
Background
Some capability dimensions may be more important than others in determining someone’s well-being, and these preferences might be dependent on ill-health experience. This study aimed to explore the relative preference weights of the 16 items of the German language version of the OxCAP-MH (Oxford Capability questionnaire-Mental Health) cap...
In this commentary, we discuss the analysis of trajectories of pulse wave velocity in a longitudinal cohort study of children with chronic kidney disease (the Cardiovascular Comorbidity in Children with Chronic Kidney Disease – Transplantation study). We revisit the analysis made by the study authors and unravel some additional limitations. We also...
Background
In binary logistic regression data are ‘separable’ if there exists a linear combination of explanatory variables which perfectly predicts the observed outcome, leading to non-existence of some of the maximum likelihood coefficient estimates. A popular solution to obtain finite estimates even with separable data is Firth’s logistic regres...
Objective:
To identify and critically appraise risk prediction models for living donor solid organ transplant counselling.
Study design and setting:
We systematically reviewed articles describing the development or validation of prognostic risk prediction models about living donor solid organ (kidney and liver) transplantation indexed in Medline...
Objective
In a previous phase II trial, we showed that topical imiquimod (IMQ) therapy is an efficacious treatment for high-grade squamous intraepithelial lesion (HSIL). Aim of the present study was to investigate the non-inferiority of a 16-week topical, self-applied IMQ therapy compared to large loop excision of the transformation zone (LLETZ) in...
Although regression models play a central role in the analysis of medical research projects, there still exist many misconceptions on various aspects of modeling leading to faulty analyses. Indeed, the rapidly developing statistical methodology and its recent advances in regression modeling do not seem to be adequately reflected in many medical pub...
Poisson regression can be challenging with sparse data, in particular with certain data constellations where maximum likelihood estimates of regression coefficients do not exist. This paper provides a comprehensive evaluation of methods that give finite regression coefficients when maximum likelihood estimates do not exist, including Firth’s genera...
Background
While machine learning (ML) algorithms may predict cardiovascular outcomes more accurately than statistical models, their result is usually not representable by a transparent formula. Hence, it is often unclear how specific values of predictors lead to the predictions. We aimed to demonstrate with graphical tools how predictor-risk relat...
Background
Chronic kidney disease (CKD) is a well-established complication in people with diabetes mellitus. Roughly one quarter of prevalent patients with diabetes exhibit a CKD stage of 3 or higher and the individual course of progression is highly variable. Therefore, there is a clear need to identify patients at high risk for fast progression a...
Hospital length of stay (LOS) is an important clinical and economic outcome and knowing its predictors could lead to better planning of resources needed during hospitalization. This analysis sought to identify structure, patient, and nutrition-related predictors of LOS available at the time of admission in the global nutritionDay dataset and to ana...
Introduction:
Patients with unprovoked venous thromboembolism (VTE) have a high recurrence risk, and, according to guidelines, should receive extended oral anticoagulation (OAC). OAC prevents recurrence in most patients but may cause major bleeding. Patients with a low recurrence risk could therefore benefit from limited OAC duration. The Vienna pr...
Background
For finite samples with binary outcomes penalized logistic regression such as ridge logistic regression has the potential of achieving smaller mean squared errors (MSE) of coefficients and predictions than maximum likelihood estimation. There is evidence, however, that ridge logistic regression can result in highly variable calibration s...
Background
Statistical model building requires selection of variables for a model depending on the model’s aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data...
Poster for the ISCB42, Lyon, France, 2021
Health care claims databases maintained by social insurance institutions provide rich and sometimes easily accessible data sources for epidemiological research. Interpreting the registered claims, for example, drug prescriptions, as proxies for the condition of interest, for example, diabetes, they allow for nationwide prevalence estimation. We ill...
Background and Aims
Kidney transplantation is considered to be the optimal treatment strategy for eligible end stage renal disease patients. However, the body of evidence to underpin the anticipated survival advantage for kidney transplant recipients is weak, as random treatment allocation to either kidney transplantation or remaining on dialysis i...
Regression models have been in use for decades to explore and quantify the association between a dependent response and several independent variables in environmental sciences, epidemiology and public health. However, researchers often encounter situations in which some independent variables exhibit high bivariate correlation, or may even be collin...
Background
The use of potentially inappropriate medication (PIM) in population of older adults may result in adverse drug events (ADE) already after short term exposure, especially when it is prescribed to patients with chronic kidney disease (CKD). In order to limit ADE in the treatment of older adults PIM lists have been constructed as a source o...
Background: The induction of donor-specific immunological tolerance could improve outcome after kidney transplantation. However, no tolerance protocol is available for routine clinical use. Chimerism-based regimens hold promise, but their widespread application is impeded in part by unresolved safety issues. This study tests the hypothesis that the...
For finite samples with binary outcomes penalized logistic regression such as ridge logistic regression (RR) has the potential of achieving smaller mean squared errors (MSE) of coefficients and predictions than maximum likelihood estimation. There is evidence, however, that RR is sensitive to small or sparse data situations, yielding poor performan...
Penalized logistic regression methods are frequently used to investigate the relationship between a binary outcome and a set of explanatory variables. The model performance can be assessed by measures such as the concordance statistic (c-statistic), the discrimination slope and the Brier score. Often, data resampling techniques, e.g. crossvalidatio...
Firth-type logistic regression has become a standard approach for the analysis of binary outcomes with small samples. Whereas it reduces the bias in maximum likelihood estimates of coefficients, bias towards 1/2 is introduced in the predicted probabilities. The stronger the imbalance of the outcome, the more severe is the bias in the predicted prob...
In the last decades, statistical methodology has developed rapidly, in particular in the field of regression modeling. Multivariable regression models are applied in almost all medical research projects. Therefore, the potential impact of statistical misconceptions within this field can be enormous Indeed, the current theoretical statistical knowle...
Background:
How to select variables and identify functional forms for continuous variables is a key concern when creating a multivariable model. Ad hoc 'traditional' approaches to variable selection have been in use for at least 50 years. Similarly, methods for determining functional forms for continuous variables were first suggested many years a...
Separation in logistic regression is a common problem causing failure of the iterative estimation process when finding maximum likelihood estimates. Firth's correction (FC) was proposed as a solution, providing estimates also in presence of separation. In this paper we evaluate whether ridge regression (RR) could be considered instead, specifically...
Background
Secondary prevention after acute coronary syndrome (ACS) mirrors a key position in the reduction of morbidity and mortality in this highly vulnerable patient population. Especially cardiac rehabilitation proved to be one of the most beneficial therapeutic approaches for the reduction of re-events and overall modification of cardiovascula...
Background
Secondary prevention after acute coronary syndrome (ACS) mirrors a key position in the reduction of morbidity and mortality in this highly vulnerable patient population. Especially lipid lowering therapy – via high-intensity statins (atorvastatin and rosuvastatin) – proved to be one of the most beneficial therapeutic approaches for the r...
Background
Secondary prevention after acute coronary syndrome (ACS) mirrors a key position in the reduction of morbidity and mortality in this highly vulnerable patient population. Especially dual anti-platelet therapy (DAPT) – including aspirin plus a P2Y12 inhibitor – proved to be one of the most beneficial therapeutic approaches for the reductio...
Statistical models are often fitted to obtain a concise description of the association of an outcome variable with some covariates. Even if background knowledge is available to guide preselection of covariates, stepwise variable selection is commonly applied to remove irrelevant ones. This practice may introduce additional variability and selection...
Objective:
Postscreening colorectal cancer (PSCRC) after screening colonoscopy is associated with endoscopists' performance and characteristics of resected lesions. Prior studies have shown that adenoma detection rate (ADR) is a decisive factor for PSCRC, but correlations with other parameters need further analysis and ADR may change over time.
D...
Causal inference from observational studies can be challenging with a rare outcome
event and many potential confounding variables. The probability of an individual to receive the treatment given the patient’s information, known as the propensity score, can be used in the process of matching or weighting the observational data to combat the inherent...
Release of medical data is important in the scientific world, but it compromises patient privacy, which is a major concern.
Releasing perturbed versions of the original data sets might preserve some degree of patient privacy, but more privacy leads to less utility. With proteomic biomarker data, perturbation is complicated by zero-inflated and non-...
Objectives
Risk prediction in implant dentistry presents specific challenges including the dependence of observations from patients with multiple implants and rare outcome events. The aim of this study was to use advanced statistical methods based on penalized regression to assess risk factors in implant dentistry.
Material and Methods
We conducte...
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Background
To compare open repair (OR) with EVAR for the management of ruptured infrarenal abdominal aortic aneurysms (RAAA) in a cohort study over a time period of 15 years with inverse probability of treatment weights.
Material and Methods
From 2000/01 through 2015/12 136 patients were treated for RAAA, 98 (72.1%) underwent OR, 38 (27.9%) were t...
Equations predicting the risk of occurrence of cardiovascular disease (CVD) are used in primary care to identify high-risk individuals among the general population. To improve the predictive performance of such equations, we updated the Framingham general CVD 1991 and 2008 equations and the Pooled Cohort equations for atherosclerotic CVD within fiv...
Variable selection for regression models plays a key role in the analysis of biomedical data. However, inference after selection is not covered by classical statistical frequentist theory which assumes a fixed set of covariates in the model. We review two interpretations of inference after selection: the full model view, in which the parameters of...
Objective
To review and critically appraise published and preprint reports of prediction models for diagnosing coronavirus disease 2019 (covid-19) in patients with suspected infection, for prognosis of patients with covid-19, and for detecting people in the general population at risk of being admitted to hospital for covid-19 pneumonia.
Design
Rap...
Generic medications offer substantial potential cost savings to health systems compared to their branded counterparts. In Europe and the US, they are only approved if they are bioequivalent to the respective originator product. Nevertheless, the lack of clinical outcomes is sometimes used as the reason for hesitancy in prescribing generics. We perf...
Objective: To review and critically appraise published and preprint reports of models that aim to predict either (i) presence of existing COVID-19 infection, or (ii) future complications in individuals already diagnosed with COVID-19. Any models to identify subjects at risk for COVID-19 in the general population were also included.
Design: Rapid sy...
The recent discussion on the reproducibility of scientific results is particularly relevant for preclinical research with animal models. Within certain areas of preclinical research, there exists the tradition of repeating an experiment at least twice to demonstrate replicability. If the results of the first two experiments do not agree, then the e...
Background:
Although separate prediction models for donors and recipients were previously published, we identified a need to predict outcomes of donor/recipient simultaneously, as they are clearly not independent of each other.
Methods:
We used characteristics from transplantations performed at the Oslo University Hospital from 1854 live donors,...
Purpose:
Overactive bladder (OAB) syndrome has severe effects on quality of life. Certain drugs are known risk factors for OAB but have not been investigated in a population-wide cohort. The objective of this study was to investigate the role of prescription drugs in the etiology of the OAB.
Methods:
Retrospective cohort study using a population...
The parameters of logistic regression models are usually obtained by the method of maximum likelihood (ML). However, in analyses of small data sets or data sets with unbalanced outcomes or exposures, ML parameter estimates may not exist. This situation has been termed ‘separation’ as the two outcome groups are separated by the values of a covariate...
PURPOSE: Although the discovery of prions was rewarded with a Nobel Prize, their existence was only attributed to a limited number of diseases. Recent evidence suggests that their role has been underestimated and several other proteins carry prion-like properties, like ß-amyloid, and most recently p53. High-grade serous ovarian cancers (HGSOC) harb...
The parameters of logistic regression models are usually obtained by the method of maximum likelihood (ML). However, in analyses of small data sets or data sets with unbalanced outcomes or exposures, ML parameter estimates may not exist. This situation has been termed “separation” as the two outcome groups are separated by the values of a covariate...
Chronic kidney disease (CKD) may progress to end-stage renal disease (ESRD) at different pace. Early markers of disease progression could facilitate and improve patient management. However, conventional blood and urine chemistry have proven unable to predict the progression of disease at early stages. Therefore, we performed untargeted plasma pepti...
Most research in transplant medicine includes statistical analysis of observed data. Too often authors solely rely on p‐values derived by statistical tests to answer their research questions. A p‐value smaller than 0.05 is typically used to declare ‘statistical significance’ and hence, ‘proves’ that, e.g., an intervention has an effect on the outco...
Objectives:
The aim of this study was to determine stroke rates in patients who did or did not undergo routine computed tomography angiography (CTA) aortic imaging before isolated coronary artery bypass grafting (CABG).
Methods:
We conducted a retrospective analysis of a prospectively maintained single-centre registry. Between 2009 and 2016, a t...
Clinical risk factors explain only a fraction of the variability of estimated glomerular filtration rate (eGFR) decline in people with type 2 diabetes. Cross-omics technologies by virtue of a wide spectrum screening of plasma samples have the potential to identify biomarkers for the refinement of prognosis in addition to clinical variables. Here we...
How to select variables and identify functional forms for continuous variables is a key concern when creating a multivariable model. Ad hoc 'traditional' approaches to variable selection have been in use for at least 50 years. Similarly, methods for determining functional forms for continuous variables were first suggested many years ago. More rece...
The sphingolipid and lysophosphatidate regulatory networks impact diverse mechanisms attributed to cancer cells and the tumor immune microenvironment. Deciphering the complexity demands implementation of a holistic approach combined with higher-resolution techniques. We implemented a multi-modular integrative approach consolidating the latest accom...