The EpiLink record linkage software: presentation and results of linkage test on cancer registry files.

Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori Via Venezian 1, 20133 Milan, Italy.
Methods of Information in Medicine (Impact Factor: 1.08). 02/2005; 44(1):66-71. DOI: 10.1267/METH05010066
Source: PubMed

ABSTRACT Record linkage, the process of bringing together separately compiled but related records from different databases, is essential in many areas of biomedical research. We developed a record linkage program (EpiLink), which employs a simple mathematical approach. We describe the program and present results obtained testing it in a linkage task.
EpiLink was designed to be flexible with user-friendly settings to tailor linkage and operating parameters to specific linkage tasks, and employ deterministic, probabilistic or sequential deterministic-probabilistic linkage strategies as required. The user can also standardize data format, examine linkage results and accept or discard them. We used EpiLink to link a subset of cases of the Lombardy Cancer Registry (20,724 records) with the Social Security file of the population (1,021,846 records) covered by the registry. The linkage strategy was deterministic, followed by several probabilistic linkage steps.
Manual inspection of the results showed that EpiLink achieved 98.8% specificity and 96.5% sensitivity.
EpiLink is a practical and accurate means of linking records from different databases that can be used by non-statisticians and is efficient in terms of human and financial resources.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Automated procedures are increasingly used in cancer registration, and it is important that the data produced are systematically checked for consistency and accuracy. We evaluated an automated procedure for cancer registration adopted by the Lombardy Cancer Registry in 1997, comparing automatically-generated diagnostic codes with those produced manually over one year (1997). The automatically generated cancer cases were produced by Open Registry algorithms. For manual registration, trained staff consulted clinical records, pathology reports and death certificates. The social security code, present and checked in both databases in all cases, was used to match the files in the automatic and manual databases. The cancer cases generated by the two methods were compared by manual revision. The automated procedure generated 5027 cases: 2959 (59%) were accepted automatically and 2068 (41%) were flagged for manual checking. Among the cases accepted automatically, discrepancies in data items (surname, first name, sex and date of birth) constituted 8.5% of cases, and discrepancies in the first three digits of the ICD-9 code constituted 1.6%. Among flagged cases, cancers of female genital tract, hematopoietic system, metastatic and ill-defined sites, and oropharynx predominated. The usual reasons were use of specific vs. generic codes, presence of multiple primaries, and use of extranodal vs. nodal codes for lymphomas. The percentage of automatically accepted cases ranged from 83% for breast and thyroid cancers to 13% for metastatic and ill-defined cancer sites. Since 59% of cases were accepted automatically and contained relatively few, mostly trivial discrepancies, the automatic procedure is efficient for routine case generation effectively cutting the workload required for routine case checking by this amount. Among cases not accepted automatically, discrepancies were mainly due to variations in coding practice.
    Population Health Metrics 02/2006; 4:10. · 2.11 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: High circulating glucose has been associated with increased risk of breast cancer (BC). There may also be a link between serum glucose and prognosis in women treated for BC. We assessed the effect of peridiagnostic fasting blood glucose and body mass index (BMI) on long-term BC prognosis. We retrospectively investigated 1,261 women diagnosed and treated for stage I-III BC at the National Cancer Institute, Milan, in 1996, 1999 and 2000. Data on blood tests and follow-up were obtained by linking electronic archives, with follow-up to end of 2009. Multivariate Cox modelling estimated hazard ratios (HR) with 95 % confidence intervals (CI) for distant metastasis, recurrence and death (all causes) in relation to categorized peridiagnostic fasting blood glucose and BMI. Mediation analysis investigated whether blood glucose mediated the BMI-breast cancer prognosis association. The risks of distant metastasis were significantly higher for all other quintiles compared to the lowest glucose quintile (reference <87 mg/dL) (respective HRs: 1.99 95 % CI 1.23-3.24, 1.85 95 % CI 1.14-3.0, 1.73 95 % CI 1.07-2.8, and 1.91 95 % CI 1.15-3.17). The risk of recurrence was significantly higher for all other glucose quintiles compared to the first. The risk of death was significantly higher than reference in the second, fourth and fifth quintiles. Women with BMI ≥ 25 kg/m(2) had significantly greater risks of recurrence and distant metastasis than those with BMI < 25 kg/m(2), irrespective of blood glucose. The increased risks remained invariant over a median follow-up of 9.5 years. Mediation analysis indicated that glucose and BMI had independent effects on BC prognosis. Peridiagnostic high fasting glucose and obesity predict worsened short- and long-term outcomes in BC patients. Maintaining healthy blood glucose levels and normal weight may improve prognosis.
    Breast Cancer Research and Treatment 04/2013; · 4.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Automated software for cancer registration, called Open Registry and developed by ourselves was adopted by the Varese (population-based) Cancer Registry starting from 1997. Since the use of automated cancer registration is increasing, it is important to assess the quality and completeness of the automated data being produced. In this study, we assessed the completeness of the automatically generated data by comparison with a gold standard of all cases identified by manual and automatic systems for the year 1997 when the automated system was introduced, and the manual system was still in operation. We also evaluated the efficiency of the automated system. 5027 cases were generated automatically; 2959 (59%) were accepted automatically and 2068 (41%) were flagged for manual checking. Sixty-nine cases (1.3%) were not recorded automatically, the most common reason (0.8%) being that the incidence record was dated 1998, even though the case was incident in 1997. A total of 98.7% of all cases found were picked up by the automated system. A completeness figure of 98.7% indicates that the automatic procedure is a valid alternative to manual methods for routine case generation. The fact that 59% of cases were registered automatically indicates that the system can speed up data production and enhance registry efficiency.
    Journal of Biomedical Informatics 03/2008; 41(1):24-32. · 2.13 Impact Factor


Available from
Nov 14, 2014