Using routinely collected health data to investigate the association between ethnicity and breast cancer incidence and survival: What is the impact of missing data and multiple ethnicities?

Cancer Epidemiology Group, Centre for Epidemiology & Biostatistics, University of Leeds, Level 6 Bexley Wing, St. James' University Hospital, Leeds, UK.
Ethnicity and Health (Impact Factor: 1.67). 04/2011; 16(3):201-12. DOI: 10.1080/13557858.2011.561301
Source: PubMed

ABSTRACT The aims of this study were to: (1) investigate the relationship between ethnicity and breast cancer incidence and survival using cancer registry and Hospital Episode Statistics (HES) data; and (2) assess the impact of missing data and the recording of multiple ethnicities for some patients.
A total of 48,234 breast cancer patients diagnosed between 1997 and 2003 in two English regions were identified. Ethnicity was missing in 16% of cases. Multiple imputation (10 iterations) of missing ethnicity was undertaken using a range of predictor variables. Multiple ethnicities for a single patient were recorded in 4% of cases. Three methods of assigning ethnicity were used: 'most popular' code, 'last recorded' code, and proportions calculated using all recorded episodes for each patient. Age-standardised incidence rate ratios (IRR) and 5-year survival were calculated before and after imputation for the three methods of assigning ethnicity.
Breast cancer incidence was lower in the South Asian group (IRR=0.59, 95% confidence interval [CI] 0.51-0.69 compared to the White group). In unadjusted analyses, the South Asian group had consistently higher survival compared with the White group (hazard ratio [HR]=0.81, 95% CI 0.68-0.95). After adjustment for age and stage, there were no survival differences amongst the White, South Asian and Black groups. Survival was higher in the 'Other' ethnic group when using the 'last recorded' method to assign ethnicity (HR=0.62, 95% CI 0.45-0.85 compared with the White group). The results were similar before and after imputation, using all three methods of assigning ethnicity.
Breast cancer incidence was lower in the South Asian group than in the White group. After adjusting for casemix there were no consistent survival differences amongst the ethnic groups. Although the impact of missing data and multiple ethnicities was minimal in this study, researchers should always consider these issues, as the results may not be generalisable to other populations and datasets.

9 Reads
  • Source
    • "Incomplete ethnicity data has meant research to date has had little choice but to utilise methodologies such as 1) use of proxy variables where available such as Country of Birth which have distinct limitations, 2) use of name recognition software such as Nam Pehchan and SANGRA where applicability is limited to South Asians, 3) data linkage has proved useful, 4) sensitivity analyses and 5) multiple imputation or 6) conduct studies tailored to specific populations [31-37]. Landmark reports such as 'Cancer incidence and survival by major ethnic group, England, 2002-2006' produced by the National Cancer Intelligence Network are based upon incomplete data despite linking HES and national cancer registry datasets to form the National Cancer Data Repository [1]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Ethnicity data collection has been proven to be important in health care but despite government initiatives remains incomplete and mostly un-validated in the UK. Accurate self-reported ethnicity data would enable experts to assess inequalities in health and access to services and help to ensure resources are targeted appropriately. The aim of this paper is to explore the reasons for the observed gap in ethnicity data by examining the perceptions and experiences of healthy South Asian volunteers. South Asians are the largest ethnic minority group accounting for 50% of all ethnic minorities in the UK 2001 census. Five focus groups, conducted by trained facilitators in the native language of each group, recruited 36 South Asian volunteers from local community centres and places of worship. The topic guide focused on five key areas:1) general opinions on the collection of ethnicity, 2) experiences of providing ethnicity information, 3) categories used in practice, 4) opinions of other indicators of ethnicity e.g. language, religion and culture and 5) views on how should this information be collected. The translated transcripts were analysed using a qualitative thematic approach. The findings of this Cancer Research UK commissioned study revealed that participants felt that accurate recording of ethnicity data was important in healthcare with several stating the increased prevalence of certain diseases in minority ethnic groups as an appropriate justification to improve this data. The overwhelming majority raised no objections to providing this data when the purpose of data collection is fully explained. This study confirmed that the collection of patients' ethnicity data is deemed important by potential patients but there remains uncertainty and unease as to how the data may be used. A common theme running through the focus groups was the willingness to provide these data, strongly accompanied by a desire to have more information with regard to its use.
    BMC Public Health 03/2012; 12(1):243. DOI:10.1186/1471-2458-12-243 · 2.26 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background and objectives Although ethnic group variations in cancer exist, no multiethnic, population-based, longitudinal studies are available in Europe. Our objectives were to examine ethnic variation in all-cancer, and lung, colorectal, breast and prostate cancers. Design, setting, population, measures and analysis This retrospective cohort study of 4.65 million people linked the 2001 Scottish Census (providing ethnic group) to cancer databases. With the White Scottish population as reference (value 100), directly age standardised rates and ratios (DASR and DASRR), and risk ratios, by sex and ethnic group with 95% CI were calculated for first cancers. In the results below, 95% CI around the DASRR excludes 100. Eight indicators of socio-economic position were assessed as potential confounders across all groups. Results For all cancers the White Scottish population (100) had the highest DASRRs, Indians the lowest (men 45.9 and women 41.2) and White British (men 87.6 and women 87.3) and other groups were intermediate (eg, Chinese men 57.6). For lung cancer the DASRRs for Pakistani men (45.0), and women (53.5), were low and for any mixed background men high (174.5). For colorectal cancer the DASRRs were lowest in Pakistanis (men 32.9 and women 68.9), White British (men 82.4 and women 83.7), other White (men 77.2 and women 74.9) and Chinese men (42.6). Breast cancer in women was low in Pakistanis (62.2), Chinese (63.0) and White Irish (84.0). Prostate cancer was lowest in Pakistanis (38.7), Indian (62.6) and White Irish (85.4). No socio-economic indicator was a valid confounding variable across ethnic groups. Conclusions The ‘Scottish effect’ does not apply across ethnic groups for cancer. The findings have implications for clinical care, prevention and screening, for example, responding appropriately to the known low uptake among South Asian populations of bowel screening might benefit from modelling of cost-effectiveness of screening, given comparatively low cancer rates.
    BMJ Open 09/2012; 2(5). DOI:10.1136/bmjopen-2012-001957 · 2.27 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Purpose: Information on cancer stage at diagnosis is critical for population studies investigating cancer care and outcomes. Few studies have examined the factors which impact (1) staging or (2) outcomes for patients who are registered as having unknown stage. This study investigated (1) the prevalence of unknown stage at diagnosis on the New Zealand Cancer Registry (NZCR); (2) explored factors which predict unknown stage; (3) described receipt of surgery and (4) survival outcomes for patients with unknown stage. Methods: Patients diagnosed with the most prevalent 18 cancers between 2006 and 2008 (N=41,489) were identified from the NZCR, with additional data obtained from mortality and hospitalisation databases. Logistic and Cox regression were used to investigate predictors of unknown stage and patient outcomes. Results: (1) Three distinct groups of cancers were found based on proportion of patients with unknown stage (low=up to 33% unknown stage; moderate=33-64%; high=65%+). (2) Increasing age was a significant predictor of unknown stage (adjusted odds ratios [ORs]: 1.18-1.24 per 5-year increase across groups). Patients with substantive comorbidity were more likely to have unknown stage but only for those cancers with a low (OR=2.65 [2.28-3.09]) or moderate (OR=1.17 [1.03-1.33]) proportion of patients with unknown stage. (3) Patients with unknown stage were significantly less likely to have received definitive surgery than those with local or regional disease across investigated cancers. (4) Patients with unknown stage had 28-day and 1-year survival which was intermediate between regional and distant disease. Discussion: We found that stage completeness differs widely by cancer site. In many cases, the proportion of unknown stage on a population-based register can be explained by patient, service and/or cancer related factors.
    03/2013; 37(4). DOI:10.1016/j.canep.2013.03.005
Show more