Is rigorous retrospective harmonization possible? Application of the DataSHaPER approach across 53 large studies

Research Institute - McGill University Health Centre, Montreal, Quebec, Canada.
International Journal of Epidemiology (Impact Factor: 9.2). 07/2011; 40(5):1314-28. DOI: 10.1093/ije/dyr106
Source: PubMed

ABSTRACT Methods This article examines the value of using the DataSHaPER for retrospective harmonization of established studies. Using the DataSHaPER approach, the potential to generate 148 harmonized variables from the questionnaires and physical measures collected in 53 large population-based studies (6.9 million participants) was assessed. Variable and study characteristics that might influence the potential for data synthesis were also explored. Results Out of all assessment items evaluated (148 variables for each of the 53 studies), 38% could be harmonized. Certain characteristics of variables (i.e. relative importance, individual targeted, reference period) and of studies (i.e. observational units, data collection start date and mode of questionnaire administration) were associated with the potential for harmonization. For example, for variables deemed to be essential, 62% of assessment items paired could be harmonized. Conclusion The current article shows that the DataSHaPER provides an effective and flexible approach for the retrospective harmonization of information across studies. To implement data synthesis, some additional scientific, ethico-legal and technical considerations must be addressed. The success of the DataSHaPER as a harmonization approach will depend on its continuing development and on the rigour and extent of its use. The DataSHaPER has the potential to take us closer to a truly collaborative epidemiology and offers the promise of enhanced research potential generated through synthesized databases.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Very large sample sizes are required for estimating effects which are known to be small, and for addressing intricate or complex statistical questions. This is often only achievable by pooling data from multiple stu-dies, especially in genetic epidemiology where associations between individual genetic variants and phenotypes of interest are generally weak. However, the physical pooling of experimental data across a consortium is frequently prohibited by the ethico-legal constraints that govern agreements and consents for individual studies. Study level meta-analyses are frequently used so that data from multiple studies need not be pooled to conduct an analysis, though the resulting analysis is necessarily restricted by the available summary statis-tics. The idea of maintaining data security is also of importance in other areas and approaches to carrying out 'secure analyses' that do not require sharing of data from different sources have been proposed in the technometrics literature. Crucially, the algorithms for fitting certain statistical models can be manipulated so that an individual level meta-analysis can essentially be performed without the need for pooling individual-level data by combining particular summary statistics obtained individually from each study. DataSHIELD (Data Aggregation Through Anonymous Summary-statistics from Harmonised Individual levEL Databases) is a tool to coordinate analyses of data that cannot be pooled. In this paper, we focus on explaining why a DataSHIELD approach yields identical results to an indivi-dual level meta-analysis in the case of a generalised linear model, by simply using summary statistics from each study. It is also an efficient approach to carrying out a study level meta-analysis when this is appropri-ate and when the analysis can be pre-planned. We briefly comment on the IT requirements, together with the ethical and legal challenges which must be addressed.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Research in modern biomedicine and social science requires sample sizes so large that they can often only be achieved through a pooled co-analysis of data from several studies. But the pooling of information from individuals in a central database that may be queried by researchers raises important ethico-legal questions and can be controversial. In the UK this has been highlighted by recent debate and controversy relating to the UK's proposed '' initiative, and these issues reflect important societal and professional concerns about privacy, confidentiality and intellectual property. DataSHIELD provides a novel technological solution that can circumvent some of the most basic challenges in facilitating the access of researchers and other healthcare professionals to individual-level data.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Objective To harmonize the collection of nonsurgical clinical and epidemiologic data relevant to endometriosis research, allowing large-scale collaboration. Design An international collaboration involving 34 clinical/academic centers and three industry collaborators from 16 countries on five continents. Setting In 2013, two workshops followed by global consultation, bringing together 54 leaders in endometriosis research. Patients None. Intervention(s) Development of a self-administered endometriosis patient questionnaire (EPQ), based on [1] systematic comparison of questionnaires from eight centers that collect data from endometriosis cases (and controls/comparison women) on a medium to large scale (publication on >100 cases); [2] literature evidence; and [3] several global consultation rounds. Main Outcome Measure(s) Standard recommended and minimum required questionnaires to capture detailed clinical and covariate data. Result(s) The standard recommended (EPHect EPQ-S) and minimum required (EPHect EPQ-M) questionnaires contain questions on pelvic pain, subfertility and menstrual/reproductive history, hormone/medication use, medical history, and personal information. Conclusion(s) The EPQ captures the basic set of patient characteristics and exposures considered by the WERF EPHect Working Group to be most critical for the advancement of endometriosis research, but is also relevant to other female conditions with similar risk factors and/or symptomatology. The instruments will be reviewed based on feedback from investigators, and—after a first review after 1 year—triannually through systematic follow-up surveys. Updated versions will be made available through
    Fertility and Sterility 09/2014; 102(5). DOI:10.1016/j.fertnstert.2014.07.1244 · 4.30 Impact Factor


Available from