Project

The TB Portals: An open-access, web-based platform for global drug-resistant tuberculosis data sharing and analysis

Updates
0 new
0
Recommendations
0 new
0
Followers
0 new
10
Reads
0 new
139

Project log

Valeriu Crudu
added a research item
Motivation Short-read whole genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences, and sequencing bias reduce the performance of variant calling using short-read alignment, but the loss in recall and specificity has not been adequately characterized. To benchmark short-read variant calling, we used 36 diverse clinical Mycobacterium tuberculosis (Mtb) isolates dually sequenced with Illumina short-reads and PacBio long-reads. We systematically studied the short-read variant calling accuracy and the influence of sequence uniqueness, reference bias, and GC content. Results Reference based Illumina variant calling demonstrated a maximum recall of 89.0% and minimum precision of 98.5% across parameters evaluated. The approach that maximized variant recall while still maintaining high precision (<99%) was tuning the mapping quality (MQ) filtering threshold, i.e. confidence of the read mapping (recall = 85.8%, precision = 99.1%, MQ ≥ 40). Additional masking of repetitive sequence content is an alternative conservative approach to variant calling that increases precision at cost to recall (recall = 70.2%, precision = 99.6%, MQ ≥ 40). Of the genomic positions typically excluded for Mtb, 68% are accurately called using Illumina WGS including 52/168 PE/PPE genes (34.5%). From these results we present a refined list of low confidence regions across the Mtb genome, which we found to frequently overlap with regions with structural variation, low sequence uniqueness, and low sequencing coverage. Our benchmarking results have broad implications for the use of WGS in the study of Mtb biology, inference of transmission in public health surveillance systems, and more generally for WGS applications in other organisms. Availability All relevant code is available at https://github.com/farhat-lab/mtb-illumina-wgs-evaluation Supplementary information Supplementary data are available at Bioinformatics online.
Alex Rosenthal
added a research item
Availability of trained radiologists for fast processing of CXRs in regions burdened with tuberculosis always has been a challenge, affecting both timely diagnosis and patient monitoring. The paucity of annotated images of lungs of TB patients hampers attempts to apply data-oriented algorithms for research and clinical practices. The TB Portals Program database (TBPP, https://TBPortals.niaid.nih.gov) is a global collaboration curating a large collection of the most dangerous, hard-to-cure drug-resistant tuberculosis (DR-TB) patient cases. TBPP, with 1,179 (83%) DR-TB patient cases, is a unique collection that is well positioned as a testing ground for deep learning classifiers. As of January 2019, the TBPP database contains 1,538 CXRs, of which 346 (22.5%) are annotated by a radiologist and 104 (6.7%) by a pulmonologist–leaving 1,088 (70.7%) CXRs without annotations. The Qure.ai qXR artificial intelligence automated CXR interpretation tool, was blind-tested on the 346 radiologist-annotated CXRs from the TBPP database. Qure.ai qXR CXR predictions for cavity, nodule, pleural effusion, hilar lymphadenopathy was successfully matching human expert annotations. In addition, we tested the 12 Qure.ai classifiers to find whether they correlate with treatment success (information provided by treating physicians). Ten descriptors were found as significant: abnormal CXR (p = 0.0005), pleural effusion (p = 0.048), nodule (p = 0.0004), hilar lymphadenopathy (p = 0.0038), cavity (p = 0.0002), opacity (p = 0.0006), atelectasis (p = 0.0074), consolidation (p = 0.0004), indicator of TB disease (p = < .0001), and fibrosis (p = < .0001). We conclude that applying fully automated Qure.ai CXR analysis tool is useful for fast, accurate, uniform, large-scale CXR annotation assistance, as it performed well even for DR-TB cases that were not used for initial training. Testing artificial intelligence algorithms (encapsulating both machine learning and deep learning classifiers) on diverse data collections, such as TBPP, is critically important toward progressing to clinically adopted automatic assistants for medical data analysis.
Natalia Shubladze
added a research item
Multidrug-resistant tuberculosis (mdrtb) refers to TB infection resistant to at least two most powerful anti-TB drugs, isoniazid and rifampincin. It has been estimated that globally 3.5% (which can be much higher in some regions) of newly diagnosed TB patients, and 20.5% of previously treated patients had mdrtb. Extensively drug-resistant TB (xdrtb) has resistance to rifampin and isoniazid, as well as to any member of the quinolone family and at least one of the second line injectable drugs: kanamycin, amikacin and capreomycin. xdrtb accounts for 4-20% of mdrtb. Early detection and targeted treatment are priorities for mdrtb/xdrtb control. The suspicion of mdr/xdr -pulmonary TB (mdrptb or xdrptb) by chest imaging shall suggest intensive diagnostic testing for mdrptb/xdrptb. We hypothesize that multiple nodular consolidation (NC) may serve one of the differentiators for separating dsptb vs mdrptb/xdrptb cases. For this study, mdrptb cases (n=310) and XDR-PTB cases (n=158) were from the NIAID TB Portals Program (TBPP) <https://tbportals.niaid.nih.gov>. Drug sensitive pulmonary TB (dsptb) cases were from the TBPP collection (n=112) as well as the Shenzhen Center for Chronic Disease Control (n=111), Shenzhen, China, and we excluded patients with HIV(+) status. Our study shows NC, particularly multiple NCs, is more common in mdrptb than in dsptb, and more common in xdrptb than in mdrptb. For example, 2.24% of dsptb patients, 13.23% of mdrptb patients, and 20.89% of xdrptb patients , respectively, have NCs with diameter >= 10mm equal or more than 2 in number.
Valeriu Crudu
added a research item
Background: Recurrence of drug-resistant tuberculosis (DR-TB) after treatment occurs through relapse of the initial infection or reinfection by a new drug-resistant strain. Outbreaks of DR-TB in high burden regions present unique challenges in determining recurrence status for effective disease management and treatment. In the Republic of Moldova the burden of DR-TB is exceptionally high, with many cases presenting as recurrent. Methods: We performed a retrospective analysis of Mycobacterium tuberculosis from Moldova to better understand the genomic basis of drug resistance and its effect on the determination of recurrence status in a high DR-burden environment. To do this we analyzed genomes from 278 isolates collected from 189 patients, including 87 patients with longitudinal samples. These pathogen genomes were sequenced using Illumina technology, and SNP panels were generated for each sample for use in phylogenetic and network analysis. Discordance between genomic resistance profiles and clinical drug-resistance test results was examined in detail to assess the possibility of mixed infection. Results: There were clusters of multiple patients with 10 or fewer differences among DR-TB samples, which is evidence of person-to-person transmission of DR-TB. Analysis of longitudinally collected isolates revealed that many infections exhibited little change over time, though 35 patients demonstrated reinfection by divergent (number of differences > 10) lineages. Additionally, several same-lineage sample pairs were found to be more divergent than expected for a relapsed infection. Network analysis of the H3/4.2.1 clade found very close relationships among 61 of these samples, making differentiation of reactivation and reinfection difficult. There was discordance between genomic profile and clinical drug sensitivity test results in twelve samples, and four of these had low level (but not statistically significant) variation at DR SNPs suggesting low-level mixed infections. Conclusions: Whole-genome sequencing provided a detailed view of the genealogical structure of the DR-TB epidemic in Moldova, showing that reinfection may be more prevalent than currently recognized. We also found increased evidence of mixed infection, which could be more robustly characterized with deeper levels of genomic sequencing.
Natalia Shubladze
added a research item
The TB Portals Program is an international consortium of physicians, radiologists, and microbiologists from countries with a heavy burden of drug-resistant tuberculosis working with data scientists and IT professionals. Together, we have built the TB Portals, a repository of socioeconomic/geographic, clinical, laboratory, radiological, and genomic data from patient cases of drug-resistant tuberculosis and backed by shareable, physical samples. Currently, there are 1,299 total cases from five country sites (Azerbaijan, Belarus, Moldova, Georgia, and Romania), of which 976 (75.1%) are multi- or extensively drug resistant, and 38.2%, 51.9%, and 36.3% of cases contain X-ray, CT scan, and genomic data, respectively. The top Mycobacterium tuberculosis lineages represented among collected samples are Beijing, T1, and H3, and SNPs that confer resistance to isoniazid, rifampicin, ofloxacin, and moxifloxacin occur the most frequently. These data and samples have promoted drug discovery efforts and research into genomics and quantitative image analysis to improve diagnostics, while also serving as a valuable resource for researchers and clinical providers. The TB Portals database and associated projects are continually growing, and we invite new partners and collaborations in our initiative. The TB Portals data and their associated analytical and statistical tools are freely available at: https://tbportals.niaid.nih.gov