Article

The EpiLink record linkage software: Presentation and results of linkage test on cancer registry files

Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori Via Venezian 1, 20133 Milan, Italy.
Methods of Information in Medicine (Impact Factor: 1.08). 02/2005; 44(1):66-71. DOI: 10.1267/METH05010066
Source: PubMed

ABSTRACT Record linkage, the process of bringing together separately compiled but related records from different databases, is essential in many areas of biomedical research. We developed a record linkage program (EpiLink), which employs a simple mathematical approach. We describe the program and present results obtained testing it in a linkage task.
EpiLink was designed to be flexible with user-friendly settings to tailor linkage and operating parameters to specific linkage tasks, and employ deterministic, probabilistic or sequential deterministic-probabilistic linkage strategies as required. The user can also standardize data format, examine linkage results and accept or discard them. We used EpiLink to link a subset of cases of the Lombardy Cancer Registry (20,724 records) with the Social Security file of the population (1,021,846 records) covered by the registry. The linkage strategy was deterministic, followed by several probabilistic linkage steps.
Manual inspection of the results showed that EpiLink achieved 98.8% specificity and 96.5% sensitivity.
EpiLink is a practical and accurate means of linking records from different databases that can be used by non-statisticians and is efficient in terms of human and financial resources.

Download full-text

Full-text

Available from: Andrea Tittarelli, Nov 14, 2014
5 Followers
 · 
303 Views
 · 
73 Downloads
  • Source
    • "Open Registry then links the records of the sources files to aggregate information for person. This is done using deterministic and probabilistic methods [13]. Finally data consistency checks are performed, again by ad-hoc routines within Open Registry. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Automated procedures are increasingly used in cancer registration, and it is important that the data produced are systematically checked for consistency and accuracy. We evaluated an automated procedure for cancer registration adopted by the Lombardy Cancer Registry in 1997, comparing automatically-generated diagnostic codes with those produced manually over one year (1997). The automatically generated cancer cases were produced by Open Registry algorithms. For manual registration, trained staff consulted clinical records, pathology reports and death certificates. The social security code, present and checked in both databases in all cases, was used to match the files in the automatic and manual databases. The cancer cases generated by the two methods were compared by manual revision. The automated procedure generated 5027 cases: 2959 (59%) were accepted automatically and 2068 (41%) were flagged for manual checking. Among the cases accepted automatically, discrepancies in data items (surname, first name, sex and date of birth) constituted 8.5% of cases, and discrepancies in the first three digits of the ICD-9 code constituted 1.6%. Among flagged cases, cancers of female genital tract, hematopoietic system, metastatic and ill-defined sites, and oropharynx predominated. The usual reasons were use of specific vs. generic codes, presence of multiple primaries, and use of extranodal vs. nodal codes for lymphomas. The percentage of automatically accepted cases ranged from 83% for breast and thyroid cancers to 13% for metastatic and ill-defined cancer sites. Since 59% of cases were accepted automatically and contained relatively few, mostly trivial discrepancies, the automatic procedure is efficient for routine case generation effectively cutting the workload required for routine case checking by this amount. Among cases not accepted automatically, discrepancies were mainly due to variations in coding practice.
    Population Health Metrics 02/2006; 4(1):10. DOI:10.1186/1478-7954-4-10 · 2.11 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Birth defects are a leading cause of neonatal and infant mortality in Italy, however little is known of the etiology of most defects. Improvements in diagnosis have revealed increasing numbers of clinically insignificant defects, while improvements in treatment have increased the survival of those with more serious and complex defects. For etiological studies, prevention, and management, it is important to have population-based monitoring which provides reliable data on the prevalence at birth of such defects. We recently initiated population-based birth defect monitoring in the Provinces of Mantova, Sondrio and Varese of the Region of Lombardy, northern Italy, and report data for the first year of operation (1999). The registry uses all-electronic source files (hospital discharge files, death certificates, regional health files, and pathology reports) and a proven case-generation methodology, which is described. The data were checked manually by consulting clinical records in hospitals. Completeness was checked against birth certificates by capture-recapture. Data on cases were coded according to the four-digit malformation codes of the International Classification of Diseases, Ninth Revision (ICD-9). We present data only on selected defects. We found 246 selected birth defects in 12,008 live births in 1999, 148 among boys and 98 among girls. Congenital heart defects (particularly septal defects) were the most common (90.8/10,000), followed by defects of the genitourinary tract (34.1/10, 000) (particularly hypospadias in boys), digestive system (23.3/10,000) and central nervous system (14.9/10,000), orofacial clefts (10.8/10,000) and Down syndrome (8.3/10,000). Completeness was satisfactory: analysis of birth certificates resulted in the addition of two birth defect cases to the registry. This is the first population-based analysis on selected major birth defects in the Region. The high birth prevalences for septal heart defect and hypospadias are probably due to the inclusion of minor defects and lack of coding standardization; the latter problem also seems important for other defects. However the data produced are useful for estimating the demands made on the health system by babies with birth defects.
    Population Health Metrics 02/2007; 5:4. DOI:10.1186/1478-7954-5-4 · 2.11 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Automated software for cancer registration, called Open Registry and developed by ourselves was adopted by the Varese (population-based) Cancer Registry starting from 1997. Since the use of automated cancer registration is increasing, it is important to assess the quality and completeness of the automated data being produced. In this study, we assessed the completeness of the automatically generated data by comparison with a gold standard of all cases identified by manual and automatic systems for the year 1997 when the automated system was introduced, and the manual system was still in operation. We also evaluated the efficiency of the automated system. 5027 cases were generated automatically; 2959 (59%) were accepted automatically and 2068 (41%) were flagged for manual checking. Sixty-nine cases (1.3%) were not recorded automatically, the most common reason (0.8%) being that the incidence record was dated 1998, even though the case was incident in 1997. A total of 98.7% of all cases found were picked up by the automated system. A completeness figure of 98.7% indicates that the automatic procedure is a valid alternative to manual methods for routine case generation. The fact that 59% of cases were registered automatically indicates that the system can speed up data production and enhance registry efficiency.
    Journal of Biomedical Informatics 03/2008; 41(1):24-32. DOI:10.1016/j.jbi.2007.03.003 · 2.48 Impact Factor
Show more