Article

Design of a national distributed health data network

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

A distributed health data network is a system that allows secure remote analysis of separate data sets, each comprising a different medical organization's or health plan's records. Distributed health data networks are currently being planned that could cover millions of people, permitting studies of comparative clinical effectiveness, best practices, diffusion of medical technologies, and quality of care. These networks could also support assessment of medical product safety and other public health needs. Distributed network technologies allow data holders to control all uses of their data, which overcomes many practical obstacles related to confidentiality, regulation, and proprietary interests. Some of the challenges and potential methods of operation of a multipurpose, multi-institutional distributed health data network are described.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... 63,117,121,151 to function as research laboratories of primary care, 63,86,102 or as organizations that conducted research that focused on the priorities of the healthcare system. 63,86,123 In some other countries PBRNs built on their linkages to the broader healthcare system with objectives for optimizing the quality and efficiency of healthcare 28,33,39,44,50,51,62,64,67,80,81,[84][85][86]105,111,113,120,[122][123][124][125][126][127][128][129][130][131][132][133] and for timely knowledge translation and dissemination. 25,53,55,67,70,80,85,89,95,97,114,128,129,[135][136][137][138][139] Subtheme: Institutional/Governmental Support, National/State Policy, and Regulatory Environment Many PBRNs provided an interface for collaborative effort with governmental bodies and institutions. ...
... 22,86,104,114,125,154,157 In other cases, PBRNs were influenced by national initiatives and policies that either supported networks at their inception 45,47-49,52, 63,68,69,72,75,79,102,104,109,120,[155][156][157]161,165,184,250 or later, 32,36,52,75,138,[158][159][160][161][162] or they were linked to their transformation, 63,85,123,163,164 or to the dissolution of some PBRNs. 165,166 The regulatory environment in different countries facilitated commitment to longterm funding to help improve evidence-based practice and research capacity in primary care, [46][47][48][49]63,68,69,72,78,79,86,102,123,149,[155][156][157][158][159] shored up indirect support through development of agencies which became pivotal supporters of PBRNs, 32,34,35,40,51,54,87,89,128,139,141,158,159,[161][162][163]188 or positively influenced the research impact of PBRNs 34,39,55,82,108,113,128,132,169,171,172 and reinforced the patient-centeredness of their research 38,39,56,57,62,70,[73][74][75]87,113,128 Subtheme: Professional Organizations National or local-level professional organizations identified a need for collaborative practicebased research and helped establish it as part of the development of the academic discipline. 5,11,22,28,89,92,99,110,136,141,143,148,150,175,188 A number of networks were initiated by professional organizations 26,92,136,141,148,166,167,[176][177][178][179] alone or in collaboration with academic departments and/ or (research) institutes, 28,64,77,82,85,92,97,109,119,130,134,178 to link PBRNs to education and professional development. ...
... 9,11,83,116,125,126,180 Contributions of HIT sustained the development of PBRN infrastructure, either directly empowering networks to meet their growing research needs, or indirectly, when the use of a specific EHR was required for PBRN membership. 36,38,53,73,[81][82][83]109,130,132,133,138,146,185,[191][192][193]197,236 Many articles stated that PBRNs leveraged the potential of EHRs for healthcare data standardization, motivated HIT vendors to improve the quality of EHRs, and developed tools that facilitated data extraction and sharing, 53,57, 61,81,106,110,114,128,130,133,138,185,186,[192][193][194][195][196][197] clinical decision-making, learning communities, and quality improvement activities. 40,51,53,55,57,61,105,106,109,114,130,133,138,197 PBRNs in collaboration with vendors, gave rise to numerous innovative HIT applications for example, technologies developed by e-PCRN, 105,138,160,199 the shared EHR of OCHIN 32,53,107 and the data-driven CPCSSN infrastructure. ...
Article
Full-text available
Background: This article is the second part of a novel scoping review of the international literature that presents those key elements that underpin the foundational activities of Practice-Based Research Networks (PBRNs). In this article, we examine the external environment and the intersection between the internal and external environment domains. Methods: We searched electronic databases, including MEDLINE (PubMed), OVID, CINAHL (EBSCOhost), Scopus, and SAGE for publications in English between 1/1/1965 and 9/15/2021. We also searched reference lists of selected publications, gray literature and other online sources. Inductive thematic analysis was applied to construct the main themes, subthemes, and key elements from a scoping review covering up to 10 years of reported experiences of each of the 98 PBRNs that met the inclusion criteria. Results: In this study we present 2 main themes: "Stakeholders at the Intersection Between the Internal and External Environment" and the "External Environment." The first is linked to the subthemes "Patient and Community Stakeholders" and "Other Healthcare Stakeholders" and 11 key elements. The second relates to the subthemes "National Health System," "Institutional/Governmental Support, National/State Policy and Regulatory Environment" "Professional Organizations," "Leveraging Previous Research and PBRN Experience and Interacting with Other Networks" and "Health Information Technology (HIT) and HIT Vendors" and 21 key elements. Conclusions: Despite variations in geography, time, and healthcare context, PBRNs shared many similar developmental experiences over the past 5 decades. Their external environment contributed significantly to their developmental trajectories during the first 10 years of their operation.
... Addressing these critical concerns requires comprehensive governance approaches. [15][16][17][18] This paper describes critical data partner, coordinating center, and researcher needs as they relate to governance and operations with distributed health data networks, and how the PopMedNet software platform helps address those operational and governance matters. The paper focuses on specific implementation approaches and challenges as they relate to distributed network implementation and use using PopMedNet; others have described general governance issues related to distributed networks. ...
... Query results are returned for final analysis. 17,18,[24][25][26][27][28][29][30] This distributed model raises a series of important considerations for stakeholders such as data partners, network coordinating centers, and researchers. Table 1 summarizes the concerns described below. ...
... PopMedNet was designed to overcome data partners' security, operational, confidentiality, and privacy concerns. 18,[24][25][26][27] PopMedNet uses a publishand-subscribe approach that does not require any open ports or Virtual Private Networks (VPNs), eliminating a critical security concern for data partners. It is installed on a data partner end user's local machine, behind the data partner firewall. ...
Article
Full-text available
Introduction: The expanded availability of electronic health information has led to increased interest in distributed health data research networks. Distributed research network model: The distributed research network model leaves data with and under the control of the data holder. Data holders, network coordinating centers, and researchers have distinct needs and challenges within this model. Software enabled governance popmednet: The concerns of network stakeholders are addressed in the design and governance models of the PopMedNet software platform. PopMedNet features include distributed querying, customizable workflows, and auditing and search capabilities. Its flexible role-based access control system enables the enforcement of varying governance policies. Selected case studies: Four case studies describe how PopMedNet is used to enforce network governance models. Issues and challenges: Trust is an essential component of a distributed research network and must be built before data partners may be willing to participate further. The complexity of the PopMedNet system must be managed as networks grow and new data, analytic methods, and querying approaches are developed. Conclusions: The PopMedNet software platform supports a variety of network structures, governance models, and research activities through customizable features designed to meet the needs of network stakeholders.
... In cases where unconscious accident victims are brought in for treatment, having a centralized medical record system would ensure that patients brought in do not have to be tested for their blood groups or their allergies. Development of a multipurpose, multi-institutional distributed health data network would accelerate the development of a learning health care system [9]. This paper aims at developing a web-based application that handles the storage of patients' data in a central database and restricts access to only medical professionals who are authorized to use patients' data judiciously. ...
... Maro et al. [9] researched on finding a way to link the data that various healthcare providers have within their possession. In this approach, the data providers still retain access to the data and there is no possibility of sharing the medical data in real time. ...
Article
One of the key challenges confronting Africa's healthcare industry is the need to reduce unnecessary paperwork, boost the quality of medical records and increase overall quality of care. In most healthcare institutions in Africa, hours are spent on filling forms and creating new medical records for patients for every visit. Even healthcare providers who keep such records have them confined to their hospitals alone. This has led to redundant storage of medical records across various healthcare providers. This paper aims at addressing the problem of efficiency, redundancy, and time wastage with regard to managing patients' medical records. The goal is to allow doctors across multiple hospitals to have quicker access to patients' records, thereby enabling them to make faster informed diagnosis and prescribe the right medications. Paper-based medical records and prescriptions come with a lot of shortcomings: expensive to copy, transport and store; easy to destroy; difficult to analyze and determine who has seen it. Electronic patient encounters represent a quantum leap forward in legibility and the ability to rapidly retrieve information. This objective is achieved by pulling various technologies together to come up with a centralized system for managing all these records efficiently.
... In this perspective, data management can be distributed at different points during its lifecycle. In a federated system, for example, data repositories remain under the control and at the location of the institution producing the data, although data queries can originate from any participating institution, which then handles its own analysis [40,45]. Alternatively, data repositories may be distributed but the results of data queries centralized (e.g., PaTH [21]). ...
... Since one cannot assume that relevant data is readily available, this layer identifies the LHS's data sources. In the case where more than one data source is leveraged, this layer also needs to address interoperability issues [35], namely, the chosen recognized standards allowing for syntactic interoperability (the ability of systems to exchange data) such as Health Level Seven (HL7) [52], and controlled vocabularies and ontologies allowing for semantic interoperability (the ability to automatically interpret exchanged information) such as SNOMED (Systematized Nomenclature of Medicine) and RxNORM (a terminology that contains all medications available on the US market) [45,53]. Data lifecycle management procedures and processes, including how data quality will be ensured [54] should also be defined in this layer. ...
Article
Full-text available
Background The vision of transforming health systems into learning health systems (LHSs) that rapidly and continuously transform knowledge into improved health outcomes at lower cost is generating increased interest in government agencies, health organizations, and health research communities. While existing initiatives demonstrate that different approaches can succeed in making the LHS vision a reality, they are too varied in their goals, focus, and scale to be reproduced without undue effort. Indeed, the structures necessary to effectively design and implement LHSs on a larger scale are lacking. In this paper, we propose the use of architectural frameworks to develop LHSs that adhere to a recognized vision while being adapted to their specific organizational context. Architectural frameworks are high-level descriptions of an organization as a system; they capture the structure of its main components at varied levels, the interrelationships among these components, and the principles that guide their evolution. Because these frameworks support the analysis of LHSs and allow their outcomes to be simulated, they act as pre-implementation decision-support tools that identify potential barriers and enablers of system development. They thus increase the chances of successful LHS deployment. Discussion We present an architectural framework for LHSs that incorporates five dimensions—goals, scientific, social, technical, and ethical—commonly found in the LHS literature. The proposed architectural framework is comprised of six decision layers that model these dimensions. The performance layer models goals, the scientific layer models the scientific dimension, the organizational layer models the social dimension, the data layer and information technology layer model the technical dimension, and the ethics and security layer models the ethical dimension. We describe the types of decisions that must be made within each layer and identify methods to support decision-making. Conclusion In this paper, we outline a high-level architectural framework grounded in conceptual and empirical LHS literature. Applying this architectural framework can guide the development and implementation of new LHSs and the evolution of existing ones, as it allows for clear and critical understanding of the types of decisions that underlie LHS operations. Further research is required to assess and refine its generalizability and methods.
... It is common to pool individual-level data from multiple sources to increase sample size and improve generalizability of the study findings. However, sharing detailed individual-level information raises concerns about individual privacy and confidentiality, which may deter multi-center collaborations (Maro et al. 2009;Brown et al. 2010;Toh et al. 2011). Data organized in a distributed data network (DDN), where data remain behind each data partner's firewall, alleviates some of these concerns (Diamond, Mostashari, and Shirky 2009;Maro et al. 2009;Brown et al. 2010;Toh et al. 2011). ...
... However, sharing detailed individual-level information raises concerns about individual privacy and confidentiality, which may deter multi-center collaborations (Maro et al. 2009;Brown et al. 2010;Toh et al. 2011). Data organized in a distributed data network (DDN), where data remain behind each data partner's firewall, alleviates some of these concerns (Diamond, Mostashari, and Shirky 2009;Maro et al. 2009;Brown et al. 2010;Toh et al. 2011). Several analytic methods are available to perform statistical analysis within DDNs, but methods that only require summary-level information are increasingly preferred because they offer additional privacy protection (Toh et al. 2011;Rassen et al. 2013). ...
Preprint
Full-text available
Previous work has demonstrated the feasibility and value of conducting distributed regression analysis (DRA), a privacy-protecting analytic method that performs multivariable-adjusted regression analysis with only summary-level information from participating sites. To our knowledge, there are no DRA applications in SAS, the statistical software used by several large national distributed data networks (DDNs), including the Sentinel System and PCORnet. SAS/IML is available to perform the required matrix computations for DRA in the SAS system. However, not all data partners in these large DDNs have access to SAS/IML, which is licensed separately. In this first article of a two-paper series, we describe a DRA application developed for use in Base SAS and SAS/STAT modules for linear and logistic DRA within horizontally partitioned DDNs and its successful tests.
... Sharing of detailed individual-level information raises concerns about individual privacy and confidentiality, which may deter multi-center collaborations (Maro et al. 2009, Brown et al. 2010, Toh et al. 2011. Data organized in a distributed data network (DDN), where data remain behind each data partner's firewall, alleviates some of these concerns (Diamond, Mostashari, and Shirky 2009, Maro et al. 2009, Brown et al. 2010, Toh et al. 2011. ...
... Sharing of detailed individual-level information raises concerns about individual privacy and confidentiality, which may deter multi-center collaborations (Maro et al. 2009, Brown et al. 2010, Toh et al. 2011. Data organized in a distributed data network (DDN), where data remain behind each data partner's firewall, alleviates some of these concerns (Diamond, Mostashari, and Shirky 2009, Maro et al. 2009, Brown et al. 2010, Toh et al. 2011. Distributed regression analysis (DRA) within a DDN is one approach that can help overcome privacy concerns, allowing multivariable regression analysis using only summary-level information and producing equivalent results to those from pooled individual-level data analysis , Fienberg et al. 2006, Wolfson et al. 2010, Wu et al. 2012, Toh et al. 2014, Dankar 2015. ...
Preprint
Full-text available
Previous work has demonstrated the feasibility and value of conducting distributed regression analysis (DRA), a privacy-protecting analytic method that performs multivariable-adjusted regression analysis with only summary-level information from participating sites. To our knowledge, there are no DRA applications in SAS, the statistical software used by several large national distributed data networks (DDNs), including the Sentinel System and PCORnet. SAS/IML is available to perform the required matrix computations for DRA in the SAS system. However, not all data partners in these large DDNs have access to SAS/IML, which is licensed separately. In this second article of a two-paper series, we describe a DRA application developed using Base SAS and SAS/STAT modules for distributed Cox proportional hazards regression within horizontally partitioned DDNs and its successful tests.
... The centralized data repository approach is considered to be the most ideal method for MPES [40,41]; that is, a coordinating research group collects all the data from participating countries and performs the analyses. Its strengths are efficiency, flexibility, and quality of analyses because the coordinating group analysts are able to access all the data directly and to generate results. ...
... On the other hand, the distributed network approach using CDM can protect data privacy for the MPES [27]. There are key steps of this approach [28,40,42]: (1) standardize the definition and format of data and develop a CDM that covers all required information for studies; (2) convert databases into CDM by participating site or country; (3) execute a unique program created by coordinating center in all centers separately; (4) send the summarized results to the coordinating center; and (5) combine the results across centers using, e.g., metaanalytic techniques. This approach controls the relative quality of the center specific analyses by implementing a single analytical program leading to a consistent analysis across centers [42,43]. ...
Article
Full-text available
With the rapid progress of computer technology and development of electronic health records, pharmacoepidemiologic studies using multiple databases across and within nations have become feasible and popular in recent years. Multinational pharmacoepidemiologic studies (MPES) provide opportunities to compare drug utilization and effects across countries, to study rare exposures and outcomes, and to collaborate across nations. Multiple networks in North America, Europe, and Asia/Pacific region have emerged to support MPES and national and cross-national collaborations. We highlight the challenges of MPES and respective solutions with examples, including non-database pharmacoepidemiologic studies for low resource countries, distributed network approaches to address data privacy issues, considerations for biases due to variations in data structure, coding, and practice/behaviors of health care providers and patients. Because there are no standard recommendations in designing and conducting MPES, transparency is the key to adequate conduct and interpretation of the studies. More efforts in developing and standardizing the methods for MPES are required.
... In the last ten to fifteen years, reusable distributed data infrastructures that leverage realworld data have emerged to generate postmarket evidence on the benefits and risk of medical products (7). These infrastructures, embedded in research-oriented healthcare ...
... This code can be posted for transparency and replicability against databases formatted into the common data model structure. While these site-specific patient-level analytic datasets could be pooled together for analysis, they are typically maintained at each site using a distributed network (7). Summary-level information is typically shared to complete the final analysis, or meta-analytic techniques are employed to summarize the site-specific results (11). ...
Article
At the time medical products are approved, we rarely know enough about their comparative safety and effectiveness vis-à-vis alternative therapies to advise patients and providers. Postmarket evidence generation to study rare adverse events following medical product exposure increasingly requires analysis of millions of longitudinal patient records that can provide complete capture of patient experiences. In the article by Pradhan et al. (Am J Epidemiol. Glucagon-Like Peptide-1 Receptor Agonists and Risk of Anaphylactic Reaction Among Patients With Type 2 Diabetes: Multisite Population-Based Cohort Study), the authors demonstrate how observational database studies are often the most practical approach, provided these databases are carefully chosen to be fit for purpose. Distributed data networks with common data models have proliferated in the last two decades in pharmacoepidemiology, allowing efficient capture of patient data in standardized and structured format across disparate real-world data sources. Use of common data models facilitates transparency by allowing standardized programming approaches that can be easily reproduced. The distributed data network architecture, combined with a common data approach, supports not only multi-site observational studies but also pragmatic clinical trials. It also helps bridge international boundaries and further increases sample size and diversity of study population.
... 6,7 In US, Health Information Technology (HIT) for the exchange of medical data is also under development. 8,9,10 Health department takes an initiative to setup National Health Information Network (NHIN) for the exchange of medical data electronically among different organizations within a region, community or hospital system. 10,11,12 The US HITECH provide funds to establish national health IT setup, where patient's data is to be fired across a national health information highway. ...
Article
Full-text available
Introduction: National Health Information Exchange (NHIX) Systems are rapidlyevolving. Due to the cyber infrastructure and improvements in communication technology, itis possible to share healthcare related data within a geographic region electronically amonghealthcare related autonomous entities such as physicians, hospitals, test laboratories, insurers,emerging Health Information Organizations (HIO), and even government departments. StudyDesign: Whether data are collected with the RCT, Quasi-experimentation or Triangulationetc., we present to explore a NHIX system for EHR that has also been implemented as atest case. We particularly propose to demonstrate a concept application, Medical Drop Box(MDB) with the key technological components of a future NHIX System for medical industry.Setting: Data from different medical settings have been used for testing the new system butthe technological development has been done at IIU, Islamabad. Period: The proposed systemis not time bond in terms of data collection. Basically the proposed system can handle datacollected in any chunk of time in the history and can provide information as and when neededin future. Material & Methods: With MDB, a person is able to collect his/her health data andshare it with the whole medical industry according to his/her own preferences and setting.Besides the technology for handing numerous forms of health care data, the main challengeof NHIX system is to allow individuals and associated medical entities to manage and sharetheir medical information based on personal control and preferences given to each by medicallaws, information rights and privacy rules. The main focus in this research paper is to make astandard medical application for the medical data that is in exchangeable format according tothe standards defined in HL7. Results: The new system is able to make standardized Clinicaldocument for medical data in exchangeable format according to HL7 standard. The MDB isthe first step to setup NHIX system. With the help of MDB “Statistical Analyzer” now the healthindustry of the country can perform a variety of analysis for the future improvements in differenthealth settings. Conclusions: The availability of medical data of patients on MDB cloud hasimproved Clinical Impact, created new Business & Services opportunities and reduced theoverall Treatment Cost.
... Although each system is unique, the challenges associated with siloed data are consistent across the globe. [4][5][6][7] Together, the individual investments in each of these networks can be leveraged to expand overall capabilities across funding agencies and the broader public health community, improve opportunities to generate shareable knowledge, and provide extensible infrastructure for the development of LHS. [8][9][10][11][12][13] Broadly, the goal of these networks is to create multisite multiuse network structures and governance to facilitate implementation of studies using real-world data to generate real world evidence. ...
Article
Full-text available
Introduction Existing large‐scale distributed health data networks are disconnected even as they address related questions of healthcare research and public policy. This paper describes the design and implementation of a fully functional prototype open‐source tool, the Cross‐Network Directory Service (CNDS), which addresses much of what keeps distributed networks disconnected from each other. Methods The set of services needed to implement a Cross‐Directory Service was identified through engagement with stakeholders and workgroup members. CNDS was implemented using PCORnet and Sentinel network instances and tested by participating data partners. Results Web services that enable the four major functional features of the service (registration, discovery, communication, and governance) were developed and placed into an open‐source repository. The services include a robust metadata model that is extensible to accommodate a virtually unlimited inventory of metadata fields, without requiring any further software development. The user interfaces are programmatically generated based on the contents of the metadata model. Conclusion The CNDS pilot project gathered functional requirements from stakeholders and collaborating partners to build a software application to enable cross‐network data and resource sharing. The two partners—one from Sentinel and one from PCORnet—tested the software. They successfully entered metadata about their organizations and data sources and then used the Discovery and Communication functionality to find data sources of interest and send a cross‐network query. The CNDS software can help integrate disparate health data networks by providing a mechanism for data partners to participate in multiple networks, share resources, and seamlessly send queries across those networks.
... PCORnet Common Data Model. It is necessary for any distributed data set to agree on common data structures and semantics [16]. For PCORI, this agreement is in the form of the PCORnet Common Data Model (CDM) [17]. ...
Article
Full-text available
Terminology services serve an important role supporting clinical and research applications, and underpin a diverse set of processes and use cases. Through standardization efforts, terminology service-to-system interactions can leverage well-defined interfaces and predictable integration patterns. Often, however, users interact more directly with terminologies, and no such blueprints are available for describing terminology service-to-user interactions. In this work, we explore the main architecture principles necessary to build a user-centered terminology system, using an Extract-Transform-Load process as our primary usage scenario. To analyze our architecture, we present a prototype implementation based on the Common Terminology Services 2 (CTS2) standard using the Patient-Centered Network of Learning Health Systems (LHSNet) project as a concrete use case. We perform a preliminary evaluation of our prototype architecture using three architectural quality attributes: interoperability, adaptability and usability. We find that a design-time focus on user needs, cognitive models, and existing patterns is essential to maximize system utility.
... In order to address site specific differences, data sharing networks define a CDM that delineates single data structures and values that are allowed for each variable. 8,[10][11][12][13] Data contributors are required to transform their local data into the CDM structures in accordance with the precise definitions provided by the CDM developers. In addition to organizational and regulatory requirements, there are numerous technical processes associated with creating a CDM from an existing clinical system. ...
... This approach is much like that used by the FDA's Sentinel program for evaluating drugs and is the next step in creating a large EHR-based DDN for evaluating all medical devices. 32,33 Future directions ...
Article
Full-text available
Objectives: To support development of a robust postmarket device evaluation system using real-world data (RWD) from electronic health records (EHRs) and other sources, employing unique device identifiers (UDIs) to link to device information. Methods: To create consistent device-related EHR RWD across 3 institutions, we established a distributed data network and created UDI-enriched research databases (UDIRs) employing a common data model comprised of 24 tables and 472 fields. To test the system, patients receiving coronary stents between 2010 and 2019 were loaded into each institution's UDIR to support distributed queries without sharing identifiable patient information. The ability of the system to execute queries was tested with 3 quality assurance checks. To demonstrate face validity of the data, a retrospective survival study of patients receiving zotarolimus or everolimus stents from 2012 to 2017 was performed using distributed analysis. Propensity score matching was used to compare risk of 6 cardiovascular outcomes within 12 months postimplantation. Results: The test queries established network functionality. In the analysis, we identified 9141 patients (Mercy = 4905, Geisinger = 4109, Intermountain = 127); mean age 65 ± 12 years, 69% males, 23% zotarolimus. Separate matched analyses at the 3 institutions showed hazard ratio estimates (zotarolimus vs everolimus) of 0.85-1.59 for subsequent percutaneous coronary intervention (P = .14-.52), 1.06-2.03 for death (P = .16-.78) and 0.94-1.40 for the composite endpoint (P = .16-.62). Discussion: The analysis results are consistent with clinical studies comparing these devices. Conclusion: This project shows that multi-institutional data networks can provide clinically relevant real-world evidence via distributed analysis while maintaining data privacy.
... In order to address site specific differences, data sharing networks define a CDM that delineates single data structures and values that are allowed for each variable. 8,[10][11][12][13] Data contributors are required to transform their local data into the CDM structures in accordance with the precise definitions provided by the CDM developers. In addition to organizational and regulatory requirements, there are numerous technical processes associated with creating a CDM from an existing clinical system. ...
... A number of NHIX are currently being in the deployment process in USA under the federal initiative as the key cyber infrastructure for healthcare [30]. The health information technology (HIT) infrastructure is also under development in the United States [23,38]. Its cornerstone is a national health information network (NHIN) initiative, which will create a national health information exchange (NHIX) system in the US [36,38]. ...
Article
Full-text available
Telecare medicine information systems (TMISs) provides a platform to the participating medical entities to share medical data over an insecure public channel. Medical drop box (MDB) is used for the said purpose, where electronic health record (EHR) is maintained for national health information exchange (NHIX). EHR is a crucial part of MDB. Therefore, the main challenge in NHIX is to restrict MDB access to only the authenticated entities. Very Recently, Moon et al. introduced a biometrics-based authentication scheme using chaotic maps for TMISs. The authors claimed that their scheme is efficient and robust in terms of its usage and implementation. However, this paper unveils that due to storage of verifier table on server, their scheme is having scalability and efficiency issues. Furthermore, the use of the same parameters \(\mathrm{IM}_1\) and \(\mathrm{IM}_2\) during different login requests makes the scheme traceable. Therefore, an improved scheme using chaotic maps has been proposed in this paper, which provides user anonymity and untraceability along with computational efficiency. The security of the proposed scheme is evaluated in detail through the random oracle model. The analysis reveals that the proposed scheme is robust and secure against the known attacks. Moreover, analysis is further verified through popular automated tool ProVerif.
... The Sentinel program uses a distributed data system that allows data to be stored locally under the control of the 18 participating Data Partners that contribute, and regularly update, administrative claims and clinical information in a common data model. 1,[5][6][7] A key component of the existing Sentinel analytic framework is a set of customizable modular programs that are compatible with the Sentinel common data model and enable FDA to perform analyses to evaluate associations between medical products and prespecified health outcomes of interest. These programs include tools to perform both selfcontrolled and cohort-type analyses using complex design and analysis strategies, including self-controlled risk interval analyses 8 and new user cohort analyses with confounding adjustment via propensity score matching. ...
Article
The US Food and Drug Administration's Sentinel system has developed the capability to conduct active safety surveillance of marketed medical products in a large network of electronic healthcare databases. We assessed the extent to which the newly developed, semi-automated Sentinel Propensity Score Matching (PSM) tool could produce the same results as a customized protocol-driven assessment, which found an adjusted hazard ratio (HR) of 3.04 (95% confidence interval [CI], 2.81 to 3.27) comparing angioedema in patients initiating angiotensin-converting enzyme (ACE) inhibitors versus beta-blockers. Using data from 13 Data Partners between January 1, 2008 and September 30, 2013, the PSM tool identified 2,211,215 eligible ACE inhibitor and 1,673,682 eligible beta-blocker initiators. The tool produced a HR of 3.14 (95% CI, 2.86 to 3.44). This comparison provides initial evidence that Sentinel analytic tools can produce findings similar to those produced by a highly customized protocol-driven assessment. This article is protected by copyright. All rights reserved.
... In order to address site specific differences, data sharing networks define a CDM that delineates single data structures and values that are allowed for each variable. 8,[10][11][12][13] Data contributors are required to transform their local data into the CDM structures in accordance with the precise definitions provided by the CDM developers. In addition to organizational and regulatory requirements, there are numerous technical processes associated with creating a CDM from an existing clinical system. ...
Article
Full-text available
Background: Contributing health data to national, regional, and local networks or registries requires data stored in local systems with local structures and codes to be extracted, transformed, and loaded into a standard format called a Common Data Model (CDM). These processes called Extract, Transform, Load (ETL) require data partners or contributors to invest in costly technical resources with specialized skills in data models, terminologies, and programming. Given the wide range of tasks, skills, and technologies required to transform data into a CDM, a classification of ETL challenges can help identify needed resources, which in turn may encourage data partners with less-technical capabilities to participate in data-sharing networks. Methods: We conducted key-informant interviews with data partner representatives to survey the ETL challenges faced in clinical data research networks (CDRNs) and registries. A list of ETL challenges, organized into six themes was vetted during a one-day workshop with a wide range of network stakeholders including data partners, researchers, and policy experts. Results: We identified 24 technical ETL challenges related to the data sharing process. All of these ETL challenges were rated as "important" or "very important" by workshop participants using a five point Likert scale. Based on these findings, a framework for categorizing ETL challenges according to ETL phases, themes, and levels of data network participation was developed. Conclusions: Overcoming ETL technical challenges require significant investments in a broad array of information technologies and human resources. Identifying these technical obstacles can inform optimal resource allocation to minimize the barriers and cost of entry for new data partners into extant networks, which in turn can expand data networks' inclusiveness and diversity. This paper offers pertinent information and guiding framework that are relevant for data partners in ascertaining challenges associated with contributing data in data networks.
... It has been suggested that tumors expressing more than one basal keratin are more likely to have a dysfunctional BRCA1 pathway [131]. Consistent with this, several other studies have also suggested the predictive value of basal keratins on BRCA1 mutation [132,133]. Preclinical models of tumors with dysfunctional BRCA1 have been shown to be exclusively sensitive to cross-linking agents and inhibitors of the poly (ADP-ribose) polymerase [134], indicating the efficient therapeutic treatment of tumors of this class. ...
Article
Breast cancer is a complex disease encompassing multiple tumor entities, each characterized by distinct morphology, behavior and clinical implications. Besides estrogen receptor, progesterone receptor and human epidermal growth factor receptor 2, novel biomarkers have shown their prognostic and predictive values, complicating our understanding towards to the heterogeneity of such cancers. Ten cancer hallmarks have been proposed by Weinberg to characterize cancer and its carcinogenesis. By reviewing biomarkers and breast cancer molecular subtypes, we propose that the divergent outcome observed from patients stratified by hormone status are driven by different cancer hallmarks. 'Sustaining proliferative signaling' further differentiates cancers with positive hormone receptors. 'Activating invasion and metastasis' and 'evading immune destruction' drive the differentiation of triple negative breast cancers. 'Resisting cell death', 'genome instability and mutation' and 'deregulating cellular energetics' refine breast cancer classification with their predictive values. 'Evading growth suppressors', 'enabling replicative immortality', 'inducing angiogenesis' and 'tumor-promoting inflammation' have not been involved in breast cancer classification which need more focus in the future biomarker-related research. This review novels in its global view on breast cancer heterogeneity, which clarifies many confusions in this field and contributes to precision medicine.
... However, this practice is often not feasible as the patient-level data, even after de-identification, cannot be shared across institutions due to the privacy of patients health. One approach to address this challenge is the distributed health data networks (DHDNs, Maro et al. (2009)), through which the information of the data can be shared between participants but sharing the individual patient level data is not allowed. One example of DHDNs is pSCANNER (Ohno-Machado et al., 2014) which includes 13 data sites covering 37 million patients and has developed a suit of software tools for privacy-preserving distributed data analysis. ...
Preprint
Electronic health records (EHRs) offer great promises for advancing precision medicine and, at the same time, present significant analytical challenges. Particularly, it is often the case that patient-level data in EHRs cannot be shared across institutions (data sources) due to government regulations and/or institutional policies. As a result, there are growing interests about distributed learning over multiple EHRs databases without sharing patient-level data. To tackle such challenges, we propose a novel communication efficient method that aggregates the local optimal estimates, by turning the problem into a missing data problem. In addition, we propose incorporating posterior samples of remote sites, which can provide partial information on the missing quantities and improve efficiency of parameter estimates while having the differential privacy property and thus reducing the risk of information leaking. The proposed approach, without sharing the raw patient level data, allows for proper statistical inference and can accommodate sparse regressions. We provide theoretical investigation for the asymptotic properties of the proposed method for statistical inference as well as differential privacy, and evaluate its performance in simulations and real data analyses in comparison with several recently developed methods.
... Further, significant computational burden associated with storing and analyzing massive datasets makes data centralization less appealing in some settings. As a result of these restrictions and concerns, much interest has been demonstrated recently in distributed clinical research networks (CRNs), multi-site distributed data networks which allow for analyses across institutions without the need for data centralization 7,8 . In a CRN, each individual institution or health system maintains control over its own data, drastically reducing risk of violating patient privacy through avoiding IPD exchange. ...
Article
Full-text available
Clinical research networks (CRNs), made up of multiple healthcare systems each with patient data from several care sites, are beneficial for studying rare outcomes and increasing generalizability of results. While CRNs encourage sharing aggregate data across healthcare systems, individual systems within CRNs often cannot share patient-level data due to privacy regulations, prohibiting multi-site regression which requires an analyst to access all individual patient data pooled together. Meta-analysis is commonly used to model data stored at multiple institutions within a CRN but can result in biased estimation, most notably in rare-event contexts. We present a communication-efficient, privacy-preserving algorithm for modeling multi-site zero-inflated count outcomes within a CRN. Our method, a one-shot distributed algorithm for performing hurdle regression (ODAH), models zero-inflated count data stored in multiple sites without sharing patient-level data across sites, resulting in estimates closely approximating those that would be obtained in a pooled patient-level data analysis. We evaluate our method through extensive simulations and two real-world data applications using electronic health records: examining risk factors associated with pediatric avoidable hospitalization and modeling serious adverse event frequency associated with a colorectal cancer therapy. In simulations, ODAH produced bias less than 0.1% across all settings explored while meta-analysis estimates exhibited bias up to 12.7%, with meta-analysis performing worst in settings with high zero-inflation or low event rates. Across both applied analyses, ODAH estimates had less than 10% bias for 18 of 20 coefficients estimated, while meta-analysis estimates exhibited substantially higher bias. Relative to existing methods for distributed data analysis, ODAH offers a highly accurate, computationally efficient method for modeling multi-site zero-inflated count data.
... In addition to these EHR-based initiatives to support clinical care, EHR-based data infrastructures have been established to facilitate clinical research. 21,23,29,30 Use of EHR data for this purpose provides a unique opportunity to address important questions that enhance the value of healthcare delivery for all of its key stakeholders. 31,32 Research infrastructures using EHR data include large, publicly funded initiatives such as the National Patient-Centered Clinical Research Network, public-private partnerships such as the High Value Healthcare Collaborative, and private initiatives such as Optum Labs. ...
Article
Full-text available
The learning healthcare system uses health information technology and the health data infrastructure to apply scientific evidence at the point of clinical care while simultaneously collecting insights from that care to promote innovation in optimal healthcare delivery and to fuel new scientific discovery. To achieve these goals, the learning healthcare system requires systematic redesign of the current healthcare system, focusing on 4 major domains: science and informatics, patient-clinician partnerships, incentives, and development of a continuous learning culture. This scientific statement provides an overview of how these learning healthcare system domains can be realized in cardiovascular disease care. Current cardiovascular disease care innovations in informatics, data uses, patient engagement, continuous learning culture, and incentives are profiled. In addition, recommendations for next steps for the development of a learning healthcare system in cardiovascular care are presented.
... In United States Health Information Technology (HIT) is also under development [3,4,5]. Its the initiative and foundation for National Health Information Network (NHIN) which will create a National Health Information Exchange (NHIX) System for United States [5,6]. ...
Article
Full-text available
he main objective of this research is to make the medical data in exchangeable format with HL7 standards and available for access among medical entities according to medical law of the country. We present and implement novel methods to generate the standard Clinical Document for National Health Information Exchange (NHIX) System used in EHR. However, due to the enormity of this problem, we in particular propose to demonstrate a concept application, Medical Drop Box (MDB) with the key technological component of a future NHIX System. With the use of MDB, a person can collect his/her healthcare data and share it with doctors in a seamless way, in conformance to a regulatory framework from anywhere and anytime. Besides the technology for handling numerous forms of healthcare data, the main challenge of NHIX System is to allow individuals and associated medical entities to manage and share their medical information based on personal control and preferences given to each by medical laws, information rights and privacy rules. Statistical information can be generated by using MDB, which will help to health departments for the future decision. Standardized Clinical Document will also help to exchange of medical information in other part of the world.
... Multicenter distributed data networks support rapid evidence generation in large and diverse populations, assessment of treatment effect heterogeneity, and evaluation of rare exposures or outcomes (1)(2)(3). Existing large-scale networks include the Sentinel System (4,5), the Health Care Systems Research Collaboratory (6), and the National Patient-Centered Clinical Research Network (7). However, efficient and privacy-protecting datasharing remains a challenge in distributed data network studies. ...
Article
Distributed data networks enable large-scale epidemiologic studies, but protecting privacy while adequately adjusting for a large number of covariates continues to pose methodological challenges. Using 2 empirical examples within a 3-site distributed data network, we tested combinations of 3 aggregate-level data-sharing approaches (risk-set, summary-table, and effect-estimate), 4 confounding adjustment methods (matching, stratification, inverse probability weighting, and matching weighting), and 2 summary scores (propensity score and disease risk score) for binary and time-to-event outcomes. We assessed the performance of combinations of these data-sharing and adjustment methods by comparing their results with results from the corresponding pooled individual-level data analysis (reference analysis). For both types of outcomes, the method combinations examined yielded results identical or comparable to the reference results in most scenarios. Within each data-sharing approach, comparability between aggregate-and individual-level data analysis depended on adjustment method; for example, risk-set data-sharing with matched or stratified analysis of summary scores produced identical results, while weighted analysis showed some discrepancies. Across the adjustment methods examined, risk-set data-sharing generally performed better, while summary-table and effect-estimate data-sharing more often produced discrepancies in settings with rare outcomes and small sample sizes. Valid multivariable-adjusted analysis can be performed in distributed data networks without sharing of individual-level data.
... 6,7 In US, Health Information Technology (HIT) for the exchange of medical data is also under development. 8,9,10 Health department takes an initiative to setup National Health Information Network (NHIN) for the exchange of medical data electronically among different organizations within a region, community or hospital system. 10,11,12 The US HITECH provide funds to establish national health IT setup, where patient's data is to be fired across a national health information highway. ...
Article
Full-text available
Introduction: National Health Information Exchange (NHIX) Systems are rapidly evolving. Due to the cyber infrastructure and improvements in communication technology, it is possible to share healthcare related data within a geographic region electronically among healthcare related autonomous entities such as physicians, hospitals, test laboratories, insurers, emerging Health Information Organizations (HIO), and even government departments. Study Design: Whether data are collected with the RCT, Quasi-experimentation or Triangulation etc., we present to explore a NHIX system for EHR that has also been implemented as a test case. We particularly propose to demonstrate a concept application, Medical Drop Box (MDB) with the key technological components of a future NHIX System for medical industry. Setting: Data from different medical settings have been used for testing the new system but the technological development has been done at IIU, Islamabad. Period: The proposed system is not time bond in terms of data collection. Basically the proposed system can handle data collected in any chunk of time in the history and can provide information as and when needed in future. Material & Methods: With MDB, a person is able to collect his/her health data and share it with the whole medical industry according to his/her own preferences and setting. Besides the technology for handing numerous forms of health care data, the main challenge of NHIX system is to allow individuals and associated medical entities to manage and share their medical information based on personal control and preferences given to each by medical laws, information rights and privacy rules. The main focus in this research paper is to make a standard medical application for the medical data that is in exchangeable format according to the standards defined in HL7. Results: The new system is able to make standardized Clinical document for medical data in exchangeable format according to HL7 standard. The MDB is the first step to setup NHIX system. With the help of MDB “Statistical Analyzer” now the health industry of the country can perform a variety of analysis for the future improvements in different health settings. Conclusions: The availability of medical data of patients on MDB cloud has improved Clinical Impact, created new Business & Services opportunities and reduced the overall Treatment Cost.
... The SREI architecture promotes security and interoperability of registry data, maintaining the main characteristic of the Brazilian registry system: public service delegated to private agents (the registrars), who are also currently responsible for providing information about the registry data. DDNs represent a new network architecture paradigm that enables data access and interoperability while mitigating most security and privacy risks associated with data transfer and maintenance of centralized data repositories (Maro et al., 2009). ...
Conference Paper
Full-text available
The Registering Property indicator of the Doing Business report is one of the 11 indicator sets published in order to evaluate business regulations in 190 countries. Tracking transactions at registry offices is key for evaluating this indicator. Currently, the availability of specific economic indicators for the Brazilian real estate sector is quite limited and heterogeneous, based on sparse and unrelated data sources. The digital development of the Brazilian property registry system, based on the Electronic Real Estate Registry System (SREI) model, requires the consolidation of statistical indicators on the operation of all registry offices in Brazil. SREI defines a distributed data architecture which promotes security and interoperability of registry data, offering a privacy by design approach for measuring property registry with the advent of the Brazilian General Data Protection Regulation (LGPD). This article presents an econometric model to evaluate property registry activity in Brazil in compliance with the data protection law.
... Analyses using larger populations can benefit the accuracy in estimation and prediction. The integration of research networks inside healthcare systems also allows rapid translation and dissemination of research findings into evidence-based healthcare decision making to improve health outcomes, consistent with the idea of a learning health system [4][5][6][7][8][9] . ...
Article
Full-text available
Integrating real-world data (RWD) from several clinical sites offers great opportunities to improve estimation with a more general population compared to analyses based on a single clinical site. However, sharing patient-level data across sites is practically challenging due to concerns about maintaining patient privacy. We develop a distributed algorithm to integrate heterogeneous RWD from multiple clinical sites without sharing patient-level data. The proposed distributed conditional logistic regression (dCLR) algorithm can effectively account for between-site heterogeneity and requires only one round of communication. Our simulation study and data application with the data of 14,215 COVID-19 patients from 230 clinical sites in the UnitedHealth Group Clinical Research Database demonstrate that the proposed distributed algorithm provides an estimator that is robust to heterogeneity in event rates when efficiently integrating data from multiple clinical sites. Our algorithm is therefore a practical alternative to both meta-analysis and existing distributed algorithms for modeling heterogeneous multi-site binary outcomes.
... Governance of scientific networks can address a wide array of issues-such as developing and overseeing procedures to request and use data; setting research priorities; assuring compliance with security, privacy, and human subject research requirements; addressing proprietary concerns of participating organizations; monitoring research activities; ensuring data quality and integrity; addressing conflicts of interest; developing and maintaining transparency of activity and results; and defining guidance related to data access and use, reproducibility, publishing rights, and dispute resolution. 2 Too often, scientific networks do not develop governance processes proactively or systematically, and confront issues only as problems arise. Even if network leaders and collaborators foresee the need to develop governance approaches, they may underestimate the time and effort required. ...
Article
Full-text available
Introduction: The Patient Outcomes Research to Advance Learning (PORTAL) Network was established with funding from the Patient-Centered Outcomes Research Institute (PCORI) in 2014. The PORTAL team adapted governance structures and processes from past research network collaborations. We will review and outline the structures and processes of the PORTAL governance approach and describe how proactively focusing on priority areas helped us to facilitate an ambitious research agenda. Background: For years a variety of funders have supported large-scale infrastructure grants to promote the use of clinical datasets to answer important comparative effectiveness research (CER) questions. These awards have provided the impetus for health care systems to join forces in creating clinical data research networks. Often, these scientific networks do not develop governance processes proactively or systematically, and address issues only as problems arise. Even if network leaders and collaborators foresee the need to develop governance approaches, they may underestimate the time and effort required to develop sound processes. The resulting delays can impede research progress. Innovation: Because the PORTAL sites had built trust and a foundation of collaboration by participating with one another in past research networks, essential elements of effective governance such as guiding principles, decision making processes, project governance, data governance, and stakeholders in governance were familiar to PORTAL investigators. This trust and familiarity enabled the network to rapidly prioritize areas that required sound governance approaches: responding to new research opportunities, creating a culture of trust and collaboration, conducting individual studies, within the broader network, assigning responsibility and credit to scientific investigators, sharing data while protecting privacy/security, and allocating resources. The PORTAL Governance Document, complete with a Toolkit of Appendices is included for reference and for adaptation by other networks. Credibility: As a result of identifying project-based governance priorities (IRB approval, subcontracting, selection of new research including lead PI and participating sites, and authorship) and data governance priorities (reciprocal data use agreement, analytic plan procedures, and other tools for data governance), PORTAL established most of its governance structure by Month 6 of the 18 month project. This allowed science to progress and collaborators to experience first-hand how the structures and procedures functioned in the remaining 12 months of the project, leaving ample time to refine them and to develop new structures or processes as necessary. Discussion: The use of procedures and processes with which participating investigators and their home institutions were already familiar allowed project and regulatory requirements to be established quickly to protect patients, their data, and the health care systems that act as stewards for both. As the project progressed, PORTAL was able to test and adjust the structures it put place, and to make substantive revisions by Month 17. As a result, priority processes have been predictable, transparent and effective. Conclusion/next steps: Strong governance practices are a stewardship responsibility of research networks to justify the trust of patients, health plan members, health care delivery organizations, and other stakeholders. Well-planned governance can reduce the time necessary to initiate the scientific activities of a network, a particular concern when the time frame to complete research is short. Effective network and data governance structures protect patient and institutional data as well as the interests of investigators and their institutions, and assures that the network has built an environment to meet the goals of the research.
... We illustrate our CNB method by applying it to the task of predicting the risk of CV events from longitudinal electronic health record data. The data come from a healthcare system in the Midwestern USA and were extracted from the HMO Research Network Virtual Data Warehouse (HMORN VDW) associated with that system [36][37][38]. The VDW stores data including insurance enrollment, demographics, pharmaceutical dispensing, utilization, vital signs, laboratory, census, and death records. ...
Article
Models for predicting the probability of experiencing various health outcomes or adverse events over a certain time frame (e.g., having a heart attack in the next 5 years) based on individual patient characteristics are important tools for managing patient care. Electronic health data (EHD) are appealing sources of training data because they provide access to large amounts of rich individual-level data from present-day patient populations. However, because EHD are derived by extracting information from administrative and clinical databases, some fraction of subjects will not be under observation for the entire time frame over which one wants to make predictions; this loss to follow-up is often due to disenrollment from the health system. For subjects without complete follow-up, whether or not they experienced the adverse event is unknown, and in statistical terms the event time is said to be right-censored. Most machine learning approaches to the problem have been relatively ad hoc; for example, common approaches for handling observations in which the event status is unknown include 1) discarding those observations, 2) treating them as non-events, 3) splitting those observations into two observations: one where the event occurs and one where the event does not. In this paper, we present a general-purpose approach to account for right-censored outcomes using inverse probability of censoring weighting (IPCW). We illustrate how IPCW can easily be incorporated into a number of existing machine learning algorithms used to mine big health care data including Bayesian networks, k-nearest neighbors, decision trees, and generalized additive models. We then show that our approach leads to better calibrated predictions than the three ad hoc approaches when applied to predicting the 5-year risk of experiencing a cardiovascular adverse event, using EHD from a large U.S. Midwestern healthcare system.
... Although many of these factors have certainly been described in the peer-reviewed literature, and although there are certainly examples in the literature where groups have overcome these challenges to demonstrate benefit, [14][15][16][17][18][19][20][21][22][23][24] such endeavors too often require substantial efforts and resources, are slow, and are difficult to scale or disseminate. 11,[25][26][27] Indeed, these widespread challenges often frustrate the efforts of individuals charged with accelerating the generation of evidence at the point-of-care, including informaticians, health services researchers, and others methodologists. ...
Article
Accelerating clinical and translational science and improving healthcare effectiveness, quality, and efficiency are top priorities for the United States. Increasingly, the success of such initiatives relies on leveraging point-of-care activities, data, and resources to generate evidence through routine practice. At present, leveraging healthcare activities to advance knowledge is challenging. Underlying these challenges are a variety of persistent technological, regulatory, fiscal, and socio-organizational realities. Fundamentally, these result from the fact that the current healthcare system is designed around a paradigm that enables individual patient care and views the connection between research and practice as unidirectional (ie, research findings are applied to practice using evidence-based medicine) but does not support research-related activities during practice. We suggest that a fundamental paradigm shift is needed to redefine the relationship between research and practice as bidirectional rather than unidirectional and propose the concept of evidence generating medicine to provide a framework for realizing such a shift. We discuss how a transition toward evidence generating medicine would result in a range of much-needed system-level changes that would facilitate rather than frustrate the ongoing efforts of informaticians, health services researchers, and others working to accelerate research and improve healthcare.
... Analyses using larger populations benefit from better accuracy in estimation and prediction. Furthermore, the integration of research networks inside healthcare systems allows rapid translation and dissemination of research findings into evidence-based healthcare decision making to improve health outcomes, consistent with the idea of a learning health system [6][7][8][9][10][11]. ...
Preprint
Objectives Integrating electronic health records (EHR) data from several clinical sites offers great opportunities to improve estimation with a more general population compared to analyses based on a single clinical site. However, sharing patient-level data across sites is practically challenging due to concerns about maintaining patient privacy. The objective of this study is to develop a novel distributed algorithm to integrate heterogeneous EHR data from multiple clinical sites without sharing patient-level data. Materials and Methods The proposed distributed algorithm for binary regression can effectively account for between-site heterogeneity and is communication-efficient. Our method is built on a pairwise likelihood function in the extended Mantel-Haenszel regression, which is known to be statistically highly efficient. We construct a surrogate pairwise likelihood function through approximating the target pairwise likelihood by its surrogate. We show that the proposed surrogate pairwise likelihood leads to a consistent and asymptotically normal estimator by effective communication without sharing individual patient-level data. We study the empirical performance of the proposed method through a systematic simulation study and an application with data of 14,215 COVID-19 patients from 230 clinical sites at UnitedHealth Group Clinical Research Database. Results The proposed method was shown to perform close to the gold standard approach under extensive simulation settings. When the event rate is <5%, the relative bias of the proposed estimator is 30% smaller than that of the meta-analysis estimator. The proposed method retained high accuracy across different sample sizes and event rates compared with meta-analysis. In the data evaluation, the proposed estimate has a relative bias <9% when the event rate is <1%, whereas the meta-analysis estimate has a relative bias at least 10% higher than that of the proposed method. Conclusions Our simulation study and data application demonstrate that the proposed distributed algorithm provides an estimator that is robust to heterogeneity in event rates when effectively integrating data from multiple clinical sites. Our algorithm is therefore an effective alternative to both meta-analysis and existing distributed algorithms for modeling heterogeneous multi-site binary outcomes.
... A number of different distributed data network approaches have been proposed and used in the literature [4][5][6][7]. Below we review the architectures of these distributed networks and explain how they address the privacy problem. ...
... The core Data Partner organizations include national and regional health plans partners, integrated delivery systems, and Medicare-fee-for service data. Sentinel operates as a privacy-preserving distributed data network, 5,6 with each organization maintaining possession and operational control over its data. To allow efficient and consistent implementation of a study across the System's distinct sources, each core Several key Sentinel initiatives completed before the pandemic laid the groundwork for COVID-19 activities. ...
Article
The US Food and Drug Administration's Sentinel System was established in 2009 to use routinely collected electronic health data for improving the national capability to assess post‐market medical product safety. Over more than a decade, Sentinel has become an integral part of FDA's surveillance capabilities and has been used to conduct analyses that have contributed to regulatory decisions. FDA's role in the COVID‐19 pandemic response has necessitated an expansion and enhancement of Sentinel. Here we describe how the Sentinel System has supported FDA's response to the COVID‐19 pandemic. We highlight new capabilities developed, key data generated to date, and lessons learned, particularly with respect to working with inpatient electronic health record data. Early in the pandemic, Sentinel developed a multi‐pronged approach to support FDA's anticipated data and analytic needs. It incorporated new data sources, created a rapidly refreshed database, developed protocols to assess the natural history of COVID‐19, validated a diagnosis‐code based algorithm for identifying patients with COVID‐19 in administrative claims data, and coordinated with other national and international initiatives. Sentinel is poised to answer important questions about the natural history of COVID‐19 and is positioned to use this information to study the use, safety, and potentially the effectiveness of medical products used for COVID‐19 prevention and treatment. This article is protected by copyright. All rights reserved.
Chapter
Rapid learning systems represent an opportunity for advancing evidence-based medical care through the development of integrated health systems that aggregate and apply evidence from real-time clinical data. Despite the continual growth of the published medical literature, patients and providers alike are often unsure of best practices surrounding therapies and supportive care for a variety of medical conditions including cancer. Health-care spending in oncology patients is growing in the USA, and the value of purchased oncology care is not well defined. The ultimate application of new knowledge from the medical literature to an individual patient with consideration of efficacy and cost is thus at best unclear. Rapid learning systems offer the potential to leverage real-time, practice-based clinical data in conjunction with research, administrative, and patient-reported data to drive the process of scientific discovery and address these knowledge gaps. Efforts to develop such systems are supported by the evolution of more sophisticated health information technologies, comparative effectiveness research, patient-reported outcome research, and an increasing focus on quality improvement. Rapid learning systems provide the opportunity for continuous bidirectional learning and have the potential to positively reshape individual care delivery. In this chapter, we will discuss the components of rapid learning systems and explore some working examples of rapid learning systems.
Chapter
Registries are a type of database routinely used to collect a standardized set of information on a defined cohort of patients. They are used in many different areas, from monitoring drug and device safety to comparative effectiveness research to quality improvement efforts looking to standardize care practices for patients with chronic disease. For many different types of research, particularly comparative effectiveness research or research involving children, no single institution has a large enough patient population to perform a proper study. This, coupled with the growing digitization of medical records, has led to an increased effort to create distributed research networks. The widespread adoption of electronic health records (EHRs) has resulted in a strong push to use them as the primary method of collection for registry data, capturing the necessary elements as part of routine clinical care. This would achieve the long-hoped for goal of “data in once,” moving clinical and translational research away from the common practice of populating registries through double data entry. This is illustrated by a discussion of the registry infrastructure utilized by the ImproveCareNow network, a collaborative of clinicians focused on improving the care of children with Inflammatory Bowel Disease.
Article
Purpose: The Heart Protection Study 2-Treatment of HDL to Reduce the Incidence of Vascular Events (HPS2-THRIVE) trial found higher incidence rates of adverse reactions, including bleeding, in patients receiving the combination of extended-release niacin and laropiprant versus placebo. It is not known whether these adverse events are attributable to laropiprant, not approved in the USA, or to extended-release niacin. We compared rates of major gastrointestinal bleeding and intracranial hemorrhage among initiators of extended-release niacin and initiators of fenofibrate. Methods: We used Mini-Sentinel (now Sentinel) to conduct an observational, new user cohort analysis. We included data from 5 Data Partners covering the period from January 1, 2007 to August 31, 2013. Individuals who initiated extended-release niacin were propensity score-matched to individuals who initiated fenofibrate. Within the matched cohorts, we used Cox proportional hazards models to compare rates of hospitalization for major gastrointestinal bleeding events and intracranial hemorrhage assessed using validated claims-based algorithms. Results: A total of 234 242 eligible extended-release niacin initiators were identified, of whom 210 389 (90%) were 1:1 propensity score-matched to eligible fenofibrate initiators. In propensity score-matched analyses, no differences were observed between exposure groups in rates of major gastrointestinal bleeding (hazard ratio [HR], 0.98; 95% confidence interval [CI], 0.82 to 1.18) or intracranial hemorrhage (HR, 1.21; 95% CI, 0.66 to 2.22). Results were similar in pre-specified sensitivity and subgroup analyses. Conclusions: We did not observe evidence for an association between extended-release niacin versus fenofibrate and rates of major gastrointestinal bleeding or intracranial hemorrhage.
Chapter
Many research and improvement activities, especially those that involve rare, pediatric, or chronic conditions, require the ability to pool, access or query data from multiple institutions. Here we describe informatics architectures that support quality improvement and research networks, or learning networks, as well as those that can support large-scale distributed research networks. Even though the activities and motivations of these networks are very different, they still require many of the same considerations in order to perform meaningful analysis on data that have been collected in multiple settings. This includes the measurement and characterization of data quality, the use of standardized or common data models, and the tracking and management of patient privacy, among others. We describe the informatics architectures of several learning networks, and two distributed research networks, detailing the commonalities and differences between them.
Article
Purpose of review: An important component of the Food and Drug Administration's Sentinel Initiative is the active post-market risk identification and analysis (ARIA) system, which utilizes semi-automated, parameterized computer programs to implement propensity-score adjusted and self-controlled risk interval designs to conduct targeted surveillance of medical products in the Sentinel Distributed Database. In this manuscript, we review literature relevant to the development of these programs and describe their application within the Sentinel Initiative. Recent findings: These quality-checked and publicly available tools have been successfully used to conduct rapid, replicable, and targeted safety analyses of several medical products. In addition to speed and reproducibility, use of semi-automated tools allows investigators to focus on decisions regarding key methodological parameters. We also identified challenges associated with the use of these methods in distributed and prospective datasets like the Sentinel Distributed Database, namely uncertainty regarding the optimal approach to estimating propensity scores in dynamic data among data partners of heterogeneous size. Summary: Future research should focus on the methodological challenges raised by these applications as well as developing new modular programs for targeted surveillance of medical products.
Article
Full-text available
Purpose: Practice-based research networks (PBRNs) have developed dynamically across the world, paralleling the emergence of the primary care discipline. While this review focuses on the internal environment of PBRNs, the complete framework will be presented incrementally in future publications. Methods: We conducted a scoping review of the published and gray literature. Electronic databases, including MEDLINE (PubMed), OVID, CINAHL (EBSCOhost), Scopus, and SAGE Premier, were searched for publications between January 1, 1965 and December 31, 2020 for English-language articles. Rigorous inclusion/exclusion criteria were implemented to identify relevant publications, and inductive thematic analysis was applied to elucidate key elements, subthemes, and themes. Social network theory was used to synthesize findings. Results: A total of 229 publications described the establishment of 93 PBRNs in 15 countries that met the inclusion criteria. The overall framework yielded 3 main themes, 12 subthemes, and 57 key elements. Key PBRN activities included relationship building between academia and practitioners and development of a learning environment through multidirectional communication. Conclusions: PBRNs across many countries contributed significantly to shaping the landscape of primary health care and became an integral part of it. Many common features within the sphere of PBRNs can be identified that seem to promote their establishment across the world.
Article
Full-text available
Introduction Distributed research networks (DRNs) are critical components of the strategic roadmaps for the National Institutes of Health and the Food and Drug Administration as they work to move toward large-scale systems of evidence generation. The National Patient-Centered Clinical Research Network (PCORnet®) is one of the first DRNs to incorporate electronic health record data from multiple domains on a national scale. Before conducting analyses in a DRN, it is important to assess the quality and characteristics of the data. Methods PCORnet’s Coordinating Center is responsible for evaluating foundational data quality, or assessing fitness-for-use across a broad research portfolio, through a process called data curation. Data curation involves a set of analytic and querying activities to assess data quality coupled with maintenance of detailed documentation and ongoing communication with network partners. The first cycle of PCORnet data curation focused on six domains in the PCORnet common data model: demographics, diagnoses, encounters, enrollment, procedures, and vitals. Results The data curation process led to improvements in foundational data quality. Notable improvements included the elimination of data model conformance errors; a decrease in implausible height, weight, and blood pressure values; an increase in the volume of diagnoses and procedures; and more complete data for key analytic variables. Based on the findings of the first cycle, we made modifications to the curation process to increase efficiencies and further reduce variation among data partners. Discussion The iterative nature of the data curation process allows PCORnet to gradually increase the foundational level of data quality and reduce variability across the network. These activities help increase the transparency and reproducibility of analyses within PCORnet and can serve as a model for other DRNs.
Article
Full-text available
Introduction: In aggregate, existing data quality (DQ) checks are currently represented in heterogeneous formats, making it difficult to compare, categorize, and index checks. This study contributes a data element-function conceptual model to facilitate the categorization and indexing of DQ checks and explores the feasibility of leveraging natural language processing (NLP) for scalable acquisition of knowledge of common data elements and functions from DQ checks narratives. Methods: The model defines a "data element", the primary focus of the check, and a "function", the qualitative or quantitative measure over a data element. We applied NLP techniques to extract both from 172 checks for Observational Health Data Sciences and Informatics (OHDSI) and 3,434 checks for Kaiser Permanente's Center for Effectiveness and Safety Research (CESR). Results: The model was able to classify all checks. A total of 751 unique data elements and 24 unique functions were extracted. The top five frequent data element-function pairings for OHDSI were Person-Count (55 checks), Insurance-Distribution (17), Medication-Count (16), Condition-Count (14), and Observations-Count (13); for CESR, they were Medication-Variable Type (175), Medication-Missing (172), Medication-Existence (152), Medication-Count (127), and Socioeconomic Factors-Variable Type (114). Conclusions: This study shows the efficacy of the data element-function conceptual model for classifying DQ checks, demonstrates early promise of NLP-assisted knowledge acquisition, and reveals the great heterogeneity in the focus in DQ checks, confirming variation in intrinsic checks and use-case specific "fitness-for-use" checks.
Article
Full-text available
As states have embraced additional flexibility to change coverage of and payment for Medicaid services, they have also faced heightened expectations for delivering high-value care. Efforts to meet these new expectations have increased the need for rigorous, evidence-based policy, but states may face challenges finding the resources, capacity, and expertise to meet this need. By describing state-university partnerships in more than 20 states, this commentary describes innovative solutions for states that want to leverage their own data, build their analytic capacity, and create evidence-based policy. From an integrated web-based system to improve long-term care to evaluating the impact of permanent supportive housing placements on Medicaid utilization and spending, these state partnerships provide significant support to their state Medicaid programs. In 2017, these partnerships came together to create a distributed research network that supports multi-state analyses. The Medicaid Outcomes Distributed Research Network (MODRN) uses a common data model to examine Medicaid data across states, thereby increasing the analytic rigor of policy evaluations in Medicaid, and contributing to the development of a fully functioning Medicaid innovation laboratory.
Article
Full-text available
Context: Sustaining electronic health data networks and maximizing return on federal investment in their development is essential for achieving national data insight goals for transforming health care. However, crossing the business model chasm from grant funding to self-sustaining viability is challenging. Case description: This paper presents lessons learned in seeking the sustainability of the Scalable Architecture for Federated Translational Inquiries Network (SAFTINet), and electronic health data network involving over 50 primary care practices in three states. SAFTINet was developed with funding from the Agency for Healthcare Research and Quality to create a multi-state network for comparative effectiveness research (CER) involving safety-net patients. Methods: Three analyses were performed: (1) a product gap analysis of alternative data sources; (2) a Strengths-Weaknesses-Opportunities-Threat (SWOT) analysis of SAFTINet in the context of competing alternatives; and (3) a customer discovery process involving approximately 150 SAFTINet stakeholders to identify SAFTINet's sustaining value proposition for health services researchers, clinical data partners, and policy makers. Findings: The results of this business model analysis informed SAFTINet's sustainability strategy. The fundamental high-level product needs were similar between the three primary customer segments: credible data, efficient and easy to use, and relevance to their daily work or 'jobs to be done'. However, how these benefits needed to be minimally demonstrated varied by customer such that different supporting evidence was required. Major themes: The SAFTINet experience illustrates that commercialization-readiness and business model methods can be used to identify multi-sided value propositions for sustaining electronic health data networks and their data capabilities as drivers of health care transformation.
Chapter
Comparative effectiveness research (CER) addresses the effectiveness of alternative options for clinical modalities, including prevention, diagnoses, and treatments for a given medical condition, on outcomes of interest to clinicians and patients in real‐world patient populations and settings. The key elements of CER include the study of effectiveness (effect in the real world) rather than efficacy (ideal effect), as well as safety, and the comparison of alternative strategies rather than comparing one treatment to a placebo. CER can address evidence synthesis (identifying and summarizing already extant data addressing a question); evidence generation (creating new evidence using various methods); and evidence dissemination (distributing available data with the goal of modifying clinical decisions). The primary goal of CER is to increase the quantity and quality of scientific evidence to inform clinical decisions and improve patient care. To date, most CER has compared among alternative drugs, medical devices, and other therapeutics, so it is of particular relevance to pharmacoepidemiology and its methods. This chapter will provide a brief history of how CER was introduced and carried out in the US, Europe, and other countries, the definition and types of CER, the clinical and methodologic issues involved in conducting CER that are relevant to pharmacoepidemiology, and thoughts about reasonable expectations for the future.
Article
The US Food and Drug Administration (FDA) Sentinel System uses a distributed data network, a common data model, curated real-world data, and distributed analytic tools to generate evidence for FDA decision-making. Sentinel system needs include analytic flexibility, transparency, and reproducibility while protecting patient privacy. Based on over a decade of experience, a critical system limitation is the inability to identify enough medical conditions of interest in observational data to a satisfactory level of accuracy. Improving the system's ability to use computable phenotypes will require an "all of the above" approach that improves use of electronic health data while incorporating the growing array of complementary electronic health record data sources. FDA recently funded a Sentinel System Innovation Center and a Community Building and Outreach Center that will provide a platform for collaboration across disciplines to promote better use of real-world data for decision-making.
Chapter
In the present time Chatbot is an essential tool used by many organizations to provide services to their targeted customers round the clock. This research focuses on a domain-specific Chatbot that can be helpful for educational institutes. This Chatbot will be a virtual (representation) to the admission seekers. It will provide answers regarding the university, its departments, admission fees and other admission related FAQ. For the sake of the research, frequently asked questions of a university were collected and an unsupervised learning model along with natural language processing techniques was deployed to answer the questions of the admission candidates. Tokenization, stop words removal followed by vectorization were implemented for preprocessing the training data. User’s inputs were similarly processed and then tf-idf based cosine similarity applied to retrieve the best answer. Later, a user-centric evaluation metric was used to evaluate the model and as per the metric, our current model showed approximately 80% accuracy.
Article
The Sentinel System is a national electronic postmarketing resource established by the US Food and Drug Administration to support assessment of the safety and effectiveness of marketed medical products. It has built a large, multi-institutional, distributed data network that contains comprehensive electronic health data, covering about 700 million person-years of longitudinal observation time nationwide. With its sophisticated infrastructure and a large selection of flexible analytic tools, the Sentinel System permits rapid and secure analyses, while preserving patient privacy and health-system autonomy. The Sentinel System also offers enhanced capabilities, including accessing full-text medical records, supporting randomized clinical trials embedded in healthcare delivery systems, and facilitating effective collection of patient-reported data using mobile devices, among many other research programs. The nephrology research community can use the infrastructure, tools, and data that this national resource offers for evidence generation. This review summarizes the Sentinel System and its ability to rapidly generate high-quality, real-world evidence; discusses the program’s experience in, and potential for, addressing gaps in kidney care; and outlines avenues for conducting research, leveraging this national resource in collaboration with Sentinel investigators.
Article
Full-text available
Background Certain medications may increase the risk of death or death from specific causes (eg, sudden cardiac death), but these risks may not be identified in premarket randomized trials. Having the capacity to examine death in postmarket safety surveillance activities is important to the US Food and Drug Administration’s (FDA) mission to protect public health. Distributed networks of electronic health plan databases used by the FDA to conduct multicenter research or medical product safety surveillance studies often do not systematically include death or cause-of-death information. Objective This study aims to develop reusable, generalizable methods for linking multiple health plan databases with the Centers for Disease Control and Prevention’s National Death Index Plus (NDI+) data. Methods We will develop efficient administrative workflows to facilitate multicenter institutional review board (IRB) review and approval within a distributed network of 6 health plans. The study will create a distributed NDI+ linkage process that avoids sharing of identifiable patient information between health plans or with a central coordinating center. We will develop standardized criteria for selecting and retaining NDI+ matches and methods for harmonizing linked information across multiple health plans. We will test our processes within a use case comprising users and nonusers of antiarrhythmic medications. Results We will use the linked health plan and NDI+ data sets to estimate the incidences and incidence rates of mortality and specific causes of death within the study use case and compare the results with reported estimates. These comparisons provide an opportunity to assess the performance of the developed NDI+ linkage approach and lessons for future studies requiring NDI+ linkage in distributed database settings. This study is approved by the IRB at Harvard Pilgrim Health Care in Boston, MA. Results will be presented to the FDA at academic conferences and published in peer-reviewed journals. Conclusions This study will develop and test a reusable distributed NDI+ linkage approach with the goal of providing tested NDI+ linkage methods for use in future studies within distributed data networks. Having standardized and reusable methods for systematically obtaining death and cause-of-death information from NDI+ would enhance the FDA’s ability to assess mortality-related safety questions in the postmarket, real-world setting. International Registered Report Identifier (IRRID) DERR1-10.2196/21811
Chapter
The goals of pharmacovigilance include the improvement of patient care and safety in relation to the use of medicines and the improvement of public health and safety in relation to the use of medicines. This chapter covers the emerging science of pharmacovigilance for biosimilar medicines. Pharmacovigilance considerations during both premarketing and post‐marketing periods are discussed. A careful examination of both current and proposed rules for biosimilar pharmacovigilance, as well as the key challenges and innovative solutions that are evolving and being developed and tested in the field, is discussed in the context of the infrastructure that exists for the safety monitoring of small molecule drugs and innovator biologics. Many of the approaches (e.g. signal refinement processes) and systems (e.g. existing passive and active surveillance programs) used for pharmacovigilance of small molecule drugs and medical devices apply to innovator biologics and biosimilars.
Article
Full-text available
Introduction Patient privacy and data security concerns often limit the feasibility of pooling patient-level data from multiple sources for analysis. Distributed data networks (DDNs) that employ privacy-protecting analytical methods, such as distributed regression analysis (DRA), can mitigate these concerns. However, DRA is not routinely implemented in large DDNs. Objective We describe the design and implementation of a process framework and query workflow that allow automatable DRA in real-world DDNs that use PopMedNet™, an open-source distributed networking software platform. Methods We surveyed and catalogued existing hardware and software configurations at all data partners in the Sentinel System, a PopMedNet-driven DDN. Key guiding principles for the design included minimal disruptions to the current PopMedNet query workflow and minimal modifications to data partners’ hardware configurations and software requirements. Results We developed and implemented a three-step process framework and PopMedNet query workflow that enables automatable DRA: 1) assembling a de-identified patient-level dataset at each data partner, 2) distributing a DRA package to data partners for local iterative analysis, and 3) iteratively transferring intermediate files between data partners and analysis center. The DRA query workflow is agnostic to statistical software, accommodates different regression models, and allows different levels of user-specified automation. Discussion The process framework can be generalized to and the query workflow can be adopted by other PopMedNet-based DDNs. Conclusion DRA has great potential to change the paradigm of data analysis in DDNs. Successful implementation of DRA in Sentinel will facilitate adoption of the analytic approach in other DDNs.
Article
Full-text available
Large health care utilization databases are frequently used in variety of settings to study the use and outcomes of therapeutics. Their size allows the study of infrequent events, their representativeness of routine clinical care makes it possible to study real-world effectiveness and utilization patterns, and their availability at relatively low cost without long delays makes them accessible to many researchers. However, concerns about database studies include data validity, lack of detailed clinical information, and a limited ability to control confounding. We consider the strengths, limitations, and appropriate applications of health care utilization databases in epidemiology and health services research, with particular reference to the study of medications. Progress has been made on many methodologic issues related to the use of health care utilization databases in recent years, but important areas persist and merit scrutiny.
Article
Full-text available
The Cancer Research Network (CRN) comprises the National Cancer Institute and 11 nonprofit research centers affiliated with integrated health care delivery systems. The CRN, a public/private partnership, fosters multisite collaborative research on cancer prevention, screening, treatment, survival, and palliation in diverse populations. The CRN's success hinges on producing innovative cancer research that likely would not have been developed by scientists working individually, and then translating those findings into clinical practice within multiple population laboratories. The CRN is a collaborative virtual research organization characterized by user-defined sharing among scientists and health care providers of data files as well as direct access to researchers, computers, software, data, research participants, and other resources. The CRN's research management Web site fosters a high-functioning virtual scientific community by publishing standardized data definitions, file specifications, and computer programs to support merging and analyzing data from multiple health care systems. Seven major types of standardized data files developed to date include demographics, health plan eligibility, tumor registry, inpatient and ambulatory utilization, medication dispensing, laboratory tests, and imaging procedures; more will follow. Data standardization avoids rework, increases multisite data integrity, increases data security, generates shorter times from initial proposal concept to submission, and stimulates more frequent collaborations among scientists across multiple institutions. The CRN research management Web site and associated standardized data files and procedures represent a quasi-public resource, and the CRN stands ready to collaborate with researchers from outside institutions in developing and conducting innovative public domain research.
Article
Full-text available
Allopurinol dosage reduction is recommended in patients with renal dysfunction because drug toxicity risk is increased. Little information is available about serum creatinine (SCr) monitoring in ambulatory patients taking allopurinol. To evaluate SCr monitoring among patients prescribed allopurinol, identify associated factors, and evaluate administrative data in assessing monitoring. Information for this retrospective cohort study was drawn from a dataset of 2 020 037 individuals; approximately 200 000 members from each of 10 organizations. Study patients had received at least one year of ongoing allopurinol prescription dispensings. Patient variables analyzed included age, gender, chronic diseases, outpatient visits, hospitalizations, gout diagnosis, and SCr monitoring. A random sample of medical records was reviewed to assess the accuracy of the automated data. Statistical analysis included descriptive and logistic regression techniques. Overall, 1139 (26%) of 4357 patients did not have SCr monitoring. For individuals without recent hospitalization, factors protective against lack of monitoring were increasing age (OR 0.77 per 10 y; 95% CI 0.74 to 0.79), more chronic diseases (OR 0.81; 95% CI 0.78 to 0.83), more outpatient visits (OR 0.87 per 5 visits; 95% CI 0.83 to 0.91), and gout diagnosis (OR 0.74; 95% CI 0.65 to 0.85). The sensitivity and specificity of administrative data compared with medical records for SCr monitoring were 92% and 65%, respectively. More than one-fourth of patients dispensed allopurinol did not have SCr monitoring during one year of therapy. Lack of monitoring and lack of subsequent possible dosage adjustment put patients at increased risk of allopurinol toxicity.
Article
Full-text available
To describe the proportion of patients receiving drugs with a narrow therapeutic range who lacked serum drug concentration monitoring during a 1-year period of therapy and to identify patient characteristics associated with lack of monitoring. Retrospective cohort. Ambulatory patients (n = 17,748) at 10 health maintenance organizations who were receiving ongoing continuous drug therapy with digoxin, carbamazepine, divalproex sodium, lithium carbonate, lithium citrate, phenobarbital sodium, phenytoin, phenytoin sodium, primidone, quinidine gluconate, quinidine sulfate, procainamide hydrochloride, theophylline, theophylline sodium glycinate, tacrolimus, or cyclosporine for at least 12 months between January 1, 1999, and June 30, 2001, were identified. Serum drug concentration monitoring was assessed from administrative data and from medical record data. Fifty percent or more of patients receiving digoxin, theophylline, procainamide, quinidine, or primidone were not monitored, and 25% to 50% of patients receiving divalproex, carbamazepine, phenobarbital, phenytoin, or tacrolimus were not monitored. Younger age was associated with lack of monitoring for patients prescribed digoxin (adjusted odds ratio, 1.86; 95% confidence interval, 1.39-2.48) and theophylline (adjusted odds ratio, 1.58; 95% confidence interval, 1.23-2.04), while older age was associated with lack of monitoring for patients prescribed carbamazepine (adjusted odds ratio, 0.59; 95% confidence interval, 0.44-0.80) and divalproex (adjusted odds ratio, 0.50; 95% confidence interval, 0.38-0.66). Patients with fewer outpatient visits were also less likely to be monitored (P < .001). A substantial proportion of ambulatory patients receiving drugs with narrow intervals between doses resulting in beneficial and adverse effects did not have serum drug concentration monitoring during 1 year of use. Clinical implications of this finding need to be evaluated.
Article
Full-text available
Many systems for routine public health surveillance rely on centralized collection of potentially identifiable, individual, identifiable personal health information (PHI) records. Although individual, identifiable patient records are essential for conditions for which there is mandated reporting, such as tuberculosis or sexually transmitted diseases, they are not routinely required for effective syndromic surveillance. Public concern about the routine collection of large quantities of PHI to support non-traditional public health functions may make alternative surveillance methods that do not rely on centralized identifiable PHI databases increasingly desirable. The National Bioterrorism Syndromic Surveillance Demonstration Program (NDP) is an example of one alternative model. All PHI in this system is initially processed within the secured infrastructure of the health care provider that collects and holds the data, using uniform software distributed and supported by the NDP. Only highly aggregated count data is transferred to the datacenter for statistical processing and display. Detailed, patient level information is readily available to the health care provider to elucidate signals observed in the aggregated data, or for ad hoc queries. We briefly describe the benefits and disadvantages associated with this distributed processing model for routine automated syndromic surveillance. For well-defined surveillance requirements, the model can be successfully deployed with very low risk of inadvertent disclosure of PHI--a feature that may make participation in surveillance systems more feasible for organizations and more appealing to the individuals whose PHI they hold. It is possible to design and implement distributed systems to support non-routine public health needs if required.
Article
Full-text available
Amiodarone can cause liver and thyroid toxicity, but little is known about compliance with laboratory tests to evaluate liver and thyroid function among ambulatory patients who are dispensed amiodarone. The primary objective of this study was to identify the proportion of ambulatory patients who had liver aminotransferase and thyroid function tests during amiodarone therapy. Secondary objectives were to (1) describe factors associated with receipt of laboratory tests and (2) determine the accuracy of administrative data for assessing aminotransferase and thyroid function monitoring. This retrospective cohort study was conducted at 10 health maintenance organizations (HMOs) for the dates of service from January 1, 1999, through June 30, 2001. Participants included 1,055 patients dispensed amiodarone for at least 180 days within this date range; these patients were not necessarily new starts on amiodarone. Administrative claims data were analyzed to assess the percentage of patients with completed alanine/aspartate aminotransferase and thyroid function tests. Depending on the HMO site, electronic or paper medical records were reviewed to evaluate the validity of administrative claims data. Logistic regression models were used to explore factors associated with receipt of laboratory tests. Both aminotransferase and thyroid function tests were completed in 53.3% of patients within a 210-day follow-up period that included the 180-day period of amiodarone dispensings plus 30 days. Thyroid function, with or without liver function (aminotransferase tests), was assessed in 61.9% of patients, and aminotransferase tests, with or without thyroid function, were assessed in 68.2% of patients. After adjusting for patient characteristics and site, the factor most strongly associated with having both types of laboratory tests evaluated was concomitant therapy with a statin (adjusted odds ratio (OR) 1.55; 95% confidence interval (CI), 1.05-2.29). Other factors associated with having both types of laboratory tests evaluated included the number of outpatient visits in the 6 months before the period of amiodarone dispensings (adjusted OR 1.06; 95% CI, 1.00- 1.13 for each additional 5 visits) and living in a neighborhood where a higher median percentage of people had a high school or higher education (adjusted OR 1.09; 95% CI, 1.00-1.18 for every 10% increase in educational level at the block level). There was no association between monitoring and patient illness severity as measured by the number of comorbid conditions. On the basis of an evaluation of a randomly selected subset of 104 patient records, the sensitivity and specificity of automated data were 94.2% and 85.7% for aminotransferase tests and 83.3% and 81.1% for thyroid function tests, respectively. Approximately half of ambulatory patients dispensed amiodarone received both recommended laboratory tests for liver and thyroid function. Improved rates of testing for liver aminotransferase and thyroid function are needed for patients who receive amiodarone.
Article
Full-text available
Serum potassium and creatinine evaluation is recommended in patients prescribed spironolactone, yet the proportion of ambulatory patients chronically dispensed spironolactone receiving evaluation is not well understood. To estimate the rate of potassium and creatinine evaluation and identify factors associated with conducting these tests among ambulatory patients dispensed spironolactone. A retrospective cohort study was designed to evaluate patients at 10 health maintenance organizations with ongoing spironolactone dispensing for one year (N = 2257). Potassium and creatinine evaluation were determined from administrative data. Associations between patient characteristics and laboratory testing were assessed, using logistic regression modeling. Serum creatinine and potassium were evaluated in 72.3% of patients during a 13 month period. The likelihood of potassium and creatinine monitoring was greater among patients who were older (OR 1.28; 95% CI 1.17 to 1.41 per decade of life); male (OR 1.25; 95% CI 1.01 to 1.54); had diabetes (OR 1.63; 95% CI 1.31 to 2.03); received concomitant therapy with angiotensin-converting enzyme inhibitors/angiotensin receptor blockers (OR 2.23; 95% CI 1.74 to 2.87), potassium supplements (OR 1.96; 95% CI 1.51 to 2.54), or digoxin (OR 2.10 95% CI 1.48 to 2.98); or had more outpatient visits (OR 1.31; 95% CI 1.19 to 1.44). Among patients with heart failure (n = 790), factors associated with the incidence of laboratory testing were diabetes (OR 1.64, 95% CI 1.14 to 2.34), outpatient visits (OR 1.20; 95% CI 1.02 to 1.41), and digoxin therapy (OR 2.26; 95% CI 1.38 to 3.69). Three-fourths of ambulatory patients dispensed spironolactone receive recommended laboratory evaluation, with monitoring more likely to be completed in patients prescribed concomitant therapy with drugs that increase hyperkalemia risk, older patients, and those with diabetes.
Article
A vaccine against pandemic influenza may be rapidly and widely distributed, and could be used in populations with little prior exposure to influenza vaccines. Under such conditions, it will be important to gain timely information about the rates of vaccine adverse events, ideally by using electronic data from large populations. Many public and private health plans and payers have such information. Between May and September 2007, we conducted a decision maker interview and technical assessment with several health plans in the United States. The interview and survey evaluated technical capability, organizational capacity, and willingness to participate in a coordinated program of rapid safety research targeting pandemic and other influenza vaccines. Eleven health plans (eight private, three public) participated in the decision maker interview. Most interviewees were medical directors or held similar positions within their organizations. Participating plans provided coverage and/or care for approximately 150 million members in the U.S. Nine health plans completed a technical assessment survey. Most decision makers indicated interest and willingness to participate in a coordinated rapid safety surveillance program, and all reported the necessary claims data analysis experience. Respondents noted legal, procedural, budgetary, and technical barriers to participation. Senior decision makers representing private and public health plans were willing and asserted the ability of their organizations to participate in pandemic influenza vaccine safety monitoring. Developing working relationships, negotiating contracts, and obtaining necessary regulatory and legal approvals were identified as key barriers. These findings may be generalizable to other vaccines and pharmaceutical products.
Article
The National Bioterrorism Syndromic Surveillance Demonstration Program identifies new cases of illness from electronic ambulatory patient records. Its goals are to use data from health plans and practice groups to detect localized outbreaks and to facilitate rapid public health follow-up. Data are extracted nightly on patient encounters occurring during the previous 24 hours. Visits or calls with diagnostic codes corresponding to syndromes of interest are counted; repeat encounters are excluded. Daily counts of syndromes by zip code are sent to a central data repository, where they are statistically analyzed for unusual clustering by using a model-adjusted SaTScan approach. The results and raw data are displayed on a restricted website. Patient-level information stays at the originating health-care organization unless required by public health authorities. If a cluster surpasses a threshold of statistical aberration chosen by the corresponding public health department, an electronic alert can be sent to that department. The health department might then call a clinical responder, who has electronic access to records of cases contributing to clusters. The system is flexible, allowing for changes in participating organizations, syndrome definitions, and alert thresholds. It is transparent to clinicians and has been accepted by the health-care organizations that provide the data. The system's data are usable by local and national health agencies. Its software is compatible with commonly used systems and software and is mostly open-source. Ongoing activities include evaluating the system's ability to detect naturally occurring outbreaks and simulated terrorism events, automating and testing alerts and response capability, and evaluating alternative data sources.
Article
To identify correlates of laboratory monitoring errors in elderly health maintenance organization (HMO) members at the initiation of therapy with cardiovascular medications. Cross-sectional study in 10 HMOs. United States. From a 2 million-member sample, individuals aged 65 and older who received one of seven cardiovascular medications (angiotensin-converting enzyme (ACE) inhibitors, angiotensin receptor blockers (ARBs), amiodarone, digoxin, diuretics, potassium supplements, and statins) and did not have recommended baseline monitoring performed during the 180 days before or 14 days after the index dispensing. The proportion of members receiving each drug for whom recommended laboratory monitoring was not performed. Laboratory monitoring error rates stratified by sex, age group, chronic disease score, and HMO site were examined, and logistic regression was used to identify predictors of laboratory monitoring errors. Error rates varied by medication class, ranging from 23% of patients receiving potassium supplementation without serum potassium and serum creatinine monitoring to 58% of patients receiving amiodarone who did not have recommended monitoring for thyroid and liver function. Highest error rates occurred in the youngest elderly for ACE inhibitors, ARBs, digoxin, diuretics, and potassium supplements, although in patients receiving amiodarone and statins, errors were most frequent in the oldest elderly. Errors occurred more frequently in patients with less comorbidity. Laboratory monitoring errors occurred frequently in elderly HMO members at the initiation of therapy with cardiovascular medications. Further study must examine the association between these errors and adverse outcomes.