Bias due to missing SEER data in D'Amico risk stratification of prostate cancer.
ABSTRACT We examined the degree of exclusion bias that may occur due to missing data when grouping prostate cancer cases from the SEER (Surveillance, Epidemiology and End Results) database into D'Amico clinical risk groups. Exclusion bias may occur since D'Amico staging requires all 3 variables to be known and data may not be missing at random.
From the SEER database we identified 132,606 men with incident prostate cancer from 2004 to 2006. We documented age, race, Gleason score, clinical T stage, PSA and geographic region. Men were categorized into D'Amico risk groups. Those with 1 or more unknown tumor variables (prostate specific antigen, T stage and/or Gleason score) were labeled unclassified. We compared the value of the other 2 known clinical variables for men with known vs unknown prostate specific antigen, Gleason score and T stage. Demographics were compared for those with and without missing data. Results were compared using chi-square and logistic regression.
Of the men 33% had 1 or more unknown tumor variables with T stage the most commonly missing variable. There was no clinically significant difference in the value of the other 2 known tumor variables when T stage or prostate specific antigen was missing. Men older than 75 years were more likely to have unknown variables than younger men. There was significant geographic variation in the frequency of unclassified D'Amico data.
In studies in which the data set is limited to men who can be classified into a D'Amico risk group 33% of eligible patients are excluded from analysis. Such men are older and from certain SEER registries but they have tumor characteristics similar to those with complete data.
[show abstract] [hide abstract]
ABSTRACT: To investigate the completeness of TNM (Tumor-Node-Metastasis) staging for prostate cancer (PC) in the Danish Cancer Registry (DCR). We identified 20,184 men registered with first-time PC in the DCR between 2004 and 2009. These patients were linked to the Danish National Patient Register to obtain data on comorbidity according to the Charlson Comorbidity Index (CCI). We calculated the completeness and corresponding 95% confidence intervals (CI) of TNM staging overall and by the individual components. We also defined a clinically-based algorithm classifying PC into four stage categories: localized, regional, distant, and unknown. The overall completeness of TNM staging was 34.2% (95% CI: 0.34-0.35). TNM completeness improved gradually over time reaching 41.2% in 2009. TNM completeness decreased substantially with age from 75.0% among patients 0-39 years to 11.3% among patients 80 years or older. Similarly, completeness decreased with increasing comorbidity level from 37.6% among patients with low CCI to 20.3% among those with high CCI. When classifying T1 cancer as a complete registration regardless of missing N or M stage, the overall TNM completeness increased to 48.7% (95% CI: 0.48-0.49). According to the clinically-based staging algorithm, 70.5% of PC cases could be categorized into a definite clinical stage. One-third of PC patients had a complete registration of all TNM components in the DCR. Although TNM completeness improved over time, older age and high comorbidity were consistently associated with missing TNM staging. Research and monitoring based on cancer registries such as the DCR should account for missing TNM staging. Failing to do so could otherwise lead to biased results of stage-specific analyses.Clinical Epidemiology 01/2012; 4:17-23.