The EpiLink record linkage software: presentation and results of linkage test on cancer registry files.
ABSTRACT Record linkage, the process of bringing together separately compiled but related records from different databases, is essential in many areas of biomedical research. We developed a record linkage program (EpiLink), which employs a simple mathematical approach. We describe the program and present results obtained testing it in a linkage task.
EpiLink was designed to be flexible with user-friendly settings to tailor linkage and operating parameters to specific linkage tasks, and employ deterministic, probabilistic or sequential deterministic-probabilistic linkage strategies as required. The user can also standardize data format, examine linkage results and accept or discard them. We used EpiLink to link a subset of cases of the Lombardy Cancer Registry (20,724 records) with the Social Security file of the population (1,021,846 records) covered by the registry. The linkage strategy was deterministic, followed by several probabilistic linkage steps.
Manual inspection of the results showed that EpiLink achieved 98.8% specificity and 96.5% sensitivity.
EpiLink is a practical and accurate means of linking records from different databases that can be used by non-statisticians and is efficient in terms of human and financial resources.
- SourceAvailable from: Sabrina Fabiano[Show abstract] [Hide abstract]
ABSTRACT: Birth defects are a leading cause of neonatal and infant mortality in Italy, however little is known of the etiology of most defects. Improvements in diagnosis have revealed increasing numbers of clinically insignificant defects, while improvements in treatment have increased the survival of those with more serious and complex defects. For etiological studies, prevention, and management, it is important to have population-based monitoring which provides reliable data on the prevalence at birth of such defects. We recently initiated population-based birth defect monitoring in the Provinces of Mantova, Sondrio and Varese of the Region of Lombardy, northern Italy, and report data for the first year of operation (1999). The registry uses all-electronic source files (hospital discharge files, death certificates, regional health files, and pathology reports) and a proven case-generation methodology, which is described. The data were checked manually by consulting clinical records in hospitals. Completeness was checked against birth certificates by capture-recapture. Data on cases were coded according to the four-digit malformation codes of the International Classification of Diseases, Ninth Revision (ICD-9). We present data only on selected defects. We found 246 selected birth defects in 12,008 live births in 1999, 148 among boys and 98 among girls. Congenital heart defects (particularly septal defects) were the most common (90.8/10,000), followed by defects of the genitourinary tract (34.1/10, 000) (particularly hypospadias in boys), digestive system (23.3/10,000) and central nervous system (14.9/10,000), orofacial clefts (10.8/10,000) and Down syndrome (8.3/10,000). Completeness was satisfactory: analysis of birth certificates resulted in the addition of two birth defect cases to the registry. This is the first population-based analysis on selected major birth defects in the Region. The high birth prevalences for septal heart defect and hypospadias are probably due to the inclusion of minor defects and lack of coding standardization; the latter problem also seems important for other defects. However the data produced are useful for estimating the demands made on the health system by babies with birth defects.Population Health Metrics 02/2007; 5:4. · 2.11 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: High circulating glucose has been associated with increased risk of breast cancer (BC). There may also be a link between serum glucose and prognosis in women treated for BC. We assessed the effect of peridiagnostic fasting blood glucose and body mass index (BMI) on long-term BC prognosis. We retrospectively investigated 1,261 women diagnosed and treated for stage I-III BC at the National Cancer Institute, Milan, in 1996, 1999 and 2000. Data on blood tests and follow-up were obtained by linking electronic archives, with follow-up to end of 2009. Multivariate Cox modelling estimated hazard ratios (HR) with 95 % confidence intervals (CI) for distant metastasis, recurrence and death (all causes) in relation to categorized peridiagnostic fasting blood glucose and BMI. Mediation analysis investigated whether blood glucose mediated the BMI-breast cancer prognosis association. The risks of distant metastasis were significantly higher for all other quintiles compared to the lowest glucose quintile (reference <87 mg/dL) (respective HRs: 1.99 95 % CI 1.23-3.24, 1.85 95 % CI 1.14-3.0, 1.73 95 % CI 1.07-2.8, and 1.91 95 % CI 1.15-3.17). The risk of recurrence was significantly higher for all other glucose quintiles compared to the first. The risk of death was significantly higher than reference in the second, fourth and fifth quintiles. Women with BMI ≥ 25 kg/m(2) had significantly greater risks of recurrence and distant metastasis than those with BMI < 25 kg/m(2), irrespective of blood glucose. The increased risks remained invariant over a median follow-up of 9.5 years. Mediation analysis indicated that glucose and BMI had independent effects on BC prognosis. Peridiagnostic high fasting glucose and obesity predict worsened short- and long-term outcomes in BC patients. Maintaining healthy blood glucose levels and normal weight may improve prognosis.Breast Cancer Research and Treatment 04/2013; · 4.47 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Automated software for cancer registration, called Open Registry and developed by ourselves was adopted by the Varese (population-based) Cancer Registry starting from 1997. Since the use of automated cancer registration is increasing, it is important to assess the quality and completeness of the automated data being produced. In this study, we assessed the completeness of the automatically generated data by comparison with a gold standard of all cases identified by manual and automatic systems for the year 1997 when the automated system was introduced, and the manual system was still in operation. We also evaluated the efficiency of the automated system. 5027 cases were generated automatically; 2959 (59%) were accepted automatically and 2068 (41%) were flagged for manual checking. Sixty-nine cases (1.3%) were not recorded automatically, the most common reason (0.8%) being that the incidence record was dated 1998, even though the case was incident in 1997. A total of 98.7% of all cases found were picked up by the automated system. A completeness figure of 98.7% indicates that the automatic procedure is a valid alternative to manual methods for routine case generation. The fact that 59% of cases were registered automatically indicates that the system can speed up data production and enhance registry efficiency.Journal of Biomedical Informatics 03/2008; 41(1):24-32. · 2.13 Impact Factor