Andrei Sura’s research while affiliated with University of Florida and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (9)


Figure 1. A deterministic record linkage process.
Figure 2. The record linkage workflow of the OneFL Deduper tool.
Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network
  • Article
  • Full-text available

September 2019

·

307 Reads

·

34 Citations

JAMIA Open

Jiang Bian

·

Alexander Loiacono

·

Andrei Sura

·

[...]

·

Objective To implement an open-source tool that performs deterministic privacy-preserving record linkage (RL) in a real-world setting within a large research network. Materials and Methods We learned 2 efficient deterministic linkage rules using publicly available voter registration data. We then validated the 2 rules’ performance with 2 manually curated gold-standard datasets linking electronic health records and claims data from 2 sources. We developed an open-source Python-based tool—OneFL Deduper—that (1) creates seeded hash codes of combinations of patients’ quasi-identifiers using a cryptographic one-way hash function to achieve privacy protection and (2) links and deduplicates patient records using a central broker through matching of hash codes with a high precision and reasonable recall. Results We deployed the OneFl Deduper (https://github.com/ufbmi/onefl-deduper) in the OneFlorida, a state-based clinical research network as part of the national Patient-Centered Clinical Research Network (PCORnet). Using the gold-standard datasets, we achieved a precision of 97.25∼99.7% and a recall of 75.5%. With the tool, we deduplicated ∼3.5 million (out of ∼15 million) records down to 1.7 million unique patients across 6 health care partners and the Florida Medicaid program. We demonstrated the benefits of RL through examining different disease profiles of the linked cohorts. Conclusions Many factors including privacy risk considerations, policies and regulations, data availability and quality, and computing resources, can impact how a RL solution is constructed in a real-world setting. Nevertheless, RL is a significant task in improving the data quality in a network so that we can draw reliable scientific discoveries from these massive data resources.

Download

Table 1 . Characteristics of T2DM trials in the US and the UK
Table 3 . Number of T2DM Trials that provided the results for the baseline measures
Table 4 . Characteristics of patients enrolled in T2DM trials and T2DM patients in OneFlorida Data Trust and CALIBER
Table 5 . GIST scores of age, BMI, and HbA1c of T2DM trials in different phases.
Comparing and Contrasting A Priori and A Posteriori Generalizability Assessment of Clinical Trials on Type 2 Diabetes Mellitus

April 2018

·

166 Reads

·

6 Citations

AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

Clinical trials are indispensable tools for evidence-based medicine. However, they are often criticized for poor generalizability. Traditional trial generalizability assessment can only be done after the trial results are published, which compares the enrolled patients with a convenience sample of real-world patients. However, the proliferation of electronic data in clinical trial registries and clinical data warehouses offer a great opportunity to assess the generalizability during the design phase of a new trial. In this work, we compared and contrasted a priori (based on eligibility criteria) and a posteriori (based on enrolled patients) generalizability of Type 2 diabetes clinical trials. Further, we showed that comparing the study population selected by the clinical trial eligibility criteria to the real-world patient population is a good indicator of the generalizability of trials. Our findings demonstrate that the a priori generalizability of a trial is comparable to its a posteriori generalizability in identifying restrictive quantitative eligibility criteria.





Top gene expression levels by CRC cancer stage
Principal component 1 and principle component 2 by cancer stage
Demographics of patients by CRC cancer stages
The top genes in the linear regression analysis
Colorectal cancer stages transcriptome analysis

November 2017

·

88 Reads

·

36 Citations

Colorectal cancer (CRC) is the third most common cancer and the second leading cause of cancer-related deaths in the United States. The purpose of this study was to evaluate the gene expression differences in different stages of CRC. Gene expression data on 433 CRC patient samples were obtained from The Cancer Genome Atlas (TCGA). Gene expression differences were evaluated across CRC stages using linear regression. Genes with p≤0.001 in expression differences were evaluated further in principal component analysis and genes with p≤0.0001 were evaluated further in gene set enrichment analysis. A total of 377 patients with gene expression data in 20,532 genes were included in the final analysis. The numbers of patients in stage I through IV were 59, 147, 116 and 55, respectively. NEK4 gene, which encodes for NIMA related kinase 4, was differentially expressed across the four stages of CRC. The stage I patients had the highest expression of NEK4 genes, while the stage IV patients had the lowest expressions (p = 9*10⁻⁶). Ten other genes (RNF34, HIST3H2BB, NUDT6, LRCh4, GLB1L, HIST2H4A, TMEM79, AMIGO2, C20orf135 and SPSB3) had p value of 0.0001 in the differential expression analysis. Principal component analysis indicated that the patients from the 4 clinical stages do not appear to have distinct gene expression pattern. Network-based and pathway-based gene set enrichment analyses showed that these 11 genes map to multiple pathways such as meiotic synapsis and packaging of telomere ends, etc. Ten of these 11 genes were linked to Gene Ontology terms such as nucleosome, DNA packaging complex and protein-DNA interactions. The protein complex-based gene set analysis showed that four genes were involved in H2AX complex II. This study identified a small number of genes that might be associated with clinical stages of CRC. Our analysis was not able to find a molecular basis for the current clinical staging for CRC based on the gene expression patterns.




Table 1 . Characteristics of T2DM trials in the US and the UK 
Table 3 . Number of T2DM Trials that provided the results for the baseline measures 
Table 4 . Characteristics of patients enrolled in T2DM trials and T2DM patients in OneFlorida Data Trust and CALIBER 
Table 5 . GIST scores of age, BMI, and HbA1c of T2DM trials in different phases. 
Comparing and Contrasting A Priori and A Posteriori Generalizability Assessment of Clinical Trials on Type 2 Diabetes Mellitus

November 2017

·

393 Reads

Clinical trials are indispensable tools for evidence-based medicine. However, they are often criticized for poor generalizability. Traditional trial generalizability assessment can only be done after the trial results are published, which compares the enrolled patients with a convenience sample of real-world patients. However, the proliferation of electronic data in clinical trial registries and clinical data warehouses offer a great opportunity to assess the generalizability during the design phase of a new trial. In this work, we compared and contrasted a priori (based on eligibility criteria) and a posteriori (based on enrolled patients) generalizability of Type 2 diabetes clinical trials. Further, we showed that comparing the study population selected by the clinical trial eligibility criteria to the real-world patient population is a good indicator of the generalizability of trials. Our findings demonstrate that the a priori generalizability of a trial is comparable to its a posteriori generalizability in identifying restrictive quantitative eligibility criteria.

Citations (3)


... Although PPRL demonstrates significant potential in theory, such as its wide applicability in deidentified records within a public health surveillance system [11], PPRL for the context of a national statistical institute [12], privacy protection in medical data [13], and as a privacy-preserving tool for clinical research network [14], as well as achieving efficient record linkage in large datasets [15], its practical application has encountered certain limitations and challenges [16]. This is mainly due to the challenge of striking an ideal balance between computational efficiency and security. ...

Reference:

A Multi-Party Privacy-Preserving Record Linkage Method Based on Secondary Encoding
Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network

JAMIA Open

... Our findings are consistent with previous literature on clinical trial generalizability [16][17][18][19] . More SAEs were observed in real-world settings. ...

Comparing and Contrasting A Priori and A Posteriori Generalizability Assessment of Clinical Trials on Type 2 Diabetes Mellitus

AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

... Several previous studies reported that AMIGO2 plays a role in cancer, but these studies exclusively investigated its role when expressed in tumor cells [17][18][19][20][21][22][23][24][25]. For example, knockdown of the AMIGO2 gene decreased proliferation, invasion and adhesion to liver endothelial cells, whereas overexpression accelerated these processes in CRC cell lines [17,18,22]. ...

Colorectal cancer stages transcriptome analysis