Zamin Iqbal's research while affiliated with EMBL-EBI and other places

Publications (193)

Preprint
Full-text available
While the malaria parasite P. falciparum has low average genome-wide diversity levels, likely due to its recent introduction from a gorilla-infecting ancestor (~10,000-50,000 years ago), some genes display extremely high diversity levels. In particular, certain proteins expressed on the surface of human red-blood-cell-infecting merozoites (merozoit...
Article
Full-text available
Universal access to drug susceptibility testing for newly diagnosed tuberculosis patients is recommended. Access to culture-based diagnostics remains limited, and targeted molecular assays are vulnerable to emerging resistance mutations. Improved protocols for direct-from-sputum Mycobacterium tuberculosis sequencing would accelerate access to compr...
Article
Outbreak strains of Mycobacterium tuberculosis are promising candidates as targets in the search for intrinsic determinants of transmissibility, as they are responsible for many cases with sustained transmission; however, the use of low-resolution typing methods and restricted geographical investigations represent flaws in assessing the success of...
Preprint
Full-text available
The antibiotic Bedaquiline (BDQ) is a key component of new WHO regimens for drug resistant tuberculosis (TB) but predicting BDQ resistance (BDQ-R) from genotypes remains challenging. We analysed a collection (n=505) of Mycobacterium tuberculosis from two high prevalence areas in South Africa (Cape Town and Johannesburg, 2019-2020), and found 53 ind...
Article
Background Mycobacterium tuberculosis whole-genome sequencing (WGS) has been widely used for genotypic drug susceptibility testing (DST) and outbreak investigation. For both applications, Illumina technology is used by most public health laboratories; however, Nanopore technology developed by Oxford Nanopore Technologies has not been thoroughly eva...
Preprint
Background Universal access to drug susceptibility testing for newly diagnosed tuberculosis patients is recommended. Access to culture-based diagnostics remains limited and targeted molecular assays are vulnerable to emerging resistance conferring mutations. Improved sample preparation protocols for direct-from-sputum sequencing of Mycobacterium tu...
Article
Full-text available
Background Viet Nam has high rates of antimicrobial resistance (AMR) but little capacity for genomic surveillance. This study used whole genome sequencing to examine the prevalence and transmission of three key AMR pathogens in two intensive care units (ICUs) in Hanoi, Viet Nam. Methods A prospective surveillance study of all adults admitted to IC...
Article
Full-text available
Background: Multidrug-resistant (MDR) Mycobacterium tuberculosis complex (MTBC) strains are a serious health problem in India, also contributing to one-fourth of the global MDR tuberculosis (TB) burden. About 36% of the MDR MTBC strains are reported fluoroquinolone (FQ) resistant leading to high pre-extensively drug-resistant (pre-XDR) and XDR-TB...
Article
Full-text available
The emergence of drug-resistant tuberculosis is a major global public health concern that threatens the ability to control the disease. Whole-genome sequencing as a tool to rapidly diagnose resistant infections can transform patient treatment and clinical practice. While resistance mechanisms are well understood for some drugs, there are likely man...
Article
Full-text available
The Comprehensive Resistance Prediction for Tuberculosis: an International Consortium (CRyPTIC) presents here a data compendium of 12,289 Mycobacterium tuberculosis global clinical isolates, all of which have undergone whole-genome sequencing and have had their minimum inhibitory concentrations to 13 antitubercular drugs measured in a single assay....
Article
Full-text available
There are many short-read variant-calling tools, with different strengths and weaknesses. We present a tool, Minos, which combines outputs from arbitrary variant callers, increasing recall without loss of precision. We benchmark on 62 samples from three bacterial species and an outbreak of 385 Mycobacterium tuberculosis samples. Minos also enables...
Article
The open sharing of genomic data provides an incredibly rich resource for the study of bacterial evolution and function, and even anthropogenic perturbations such as the widespread use of antimicrobials. Whilst these archives are rich in data, considerable processing is required before a biological question can be addressed. Here, we have assembled...
Preprint
Background Outbreak strains are good candidates to look for intrinsic transmissibility as they are responsible for a large number of cases with sustained transmission. However, assessment of the success of long-lived outbreak strains has been flawed by the use of low-resolution typing methods and restricted geographical investigations. We now have...
Article
Full-text available
Viral sequence data from clinical samples frequently contain contaminating human reads, which must be removed prior to sharing for legal and ethical reasons. To enable host read removal for SARS-CoV-2 sequencing data on low-specification laptops, we developed ReadItAndKeep, a fast lightweight tool for Illumina and nanopore data that only keeps read...
Article
Background: Molecular diagnostics are considered the most promising route to achieving rapid, universal drug susceptibility testing for Mycobacterium tuberculosiscomplex (MTBC). We aimed to generate a WHO endorsed catalogue of mutations to serve as a global standard for interpreting molecular information for drug resistance prediction. Methods: A c...
Preprint
Full-text available
Background Mycobacterium tuberculosis whole-genome sequencing (WGS) using Illumina technology has been widely adopted for genotypic drug susceptibility testing (DST) and outbreak investigation. Oxford Nanopore Technologies is reported to have higher error rates but has not been thoroughly evaluated for these applications. Methods We analyse 151 is...
Article
Full-text available
Background Molecular diagnostics are considered the most promising route to achievement of rapid, universal drug susceptibility testing for Mycobacterium tuberculosis complex (MTBC). We aimed to generate a WHO-endorsed catalogue of mutations to serve as a global standard for interpreting molecular information for drug resistance prediction. Method...
Preprint
Full-text available
Background: Healthcare-associated infections (HCAIs) affect the most vulnerable persons in society and are increasingly difficult to treat in the face of mounting antimicrobial resistance (AMR). We used whole-genome sequencing (WGS) to retrospectively analyse carbapenemase-producing Gram negative bacteria from a single hospital in the United Kingdo...
Preprint
Full-text available
Viral sequence data from clinical samples frequently contain human contamination, which must be removed prior to sharing for legal and ethical reasons. To enable host read removal for SARS-CoV-2 sequencing data on low-specification laptops, we developed ReadItAndKeep, a fast lightweight tool for Illumina and nanopore data that only keeps reads matc...
Article
Full-text available
Motivation Short-read whole genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences, and sequencing bias reduce the performance of variant calling using short-read alignment, but the loss in recall and specificity has not been adequately characterized....
Article
Full-text available
The open sharing of genomic data provides an incredibly rich resource for the study of bacterial evolution and function and even anthropogenic activities such as the widespread use of antimicrobials. However, these data consist of genomes assembled with different tools and levels of quality checking, and of large volumes of completely unprocessed r...
Preprint
Full-text available
Short-read variant calling for bacterial genomics is a mature field, and there are many widely-used software tools. Different underlying approaches (eg pileup, local or global assembly, paired-read use, haplotype use) lend each tool different strengths, especially when considering non-SNP (single nucleotide polymorphism) variation or potentially di...
Article
Full-text available
We present pandora , a novel pan-genome graph structure and algorithms for identifying variants across the full bacterial pan-genome. As much bacterial adaptability hinges on the accessory genome, methods which analyze SNPs in just the core genome have unsatisfactory limitations. Pandora approximates a sequenced genome as a recombinant of reference...
Article
Full-text available
Genome graphs allow very general representations of genetic variation; depending on the model and implementation, variation at different length-scales (single nucleotide polymorphisms (SNPs), structural variants) and on different sequence backgrounds can be incorporated with different levels of transparency. We implement a model which handles this...
Article
Full-text available
Background Multidrug-resistant Mycobacterium tuberculosis ( Mtb ) is a significant global public health threat. Genotypic resistance prediction from Mtb DNA sequences offers an alternative to laboratory-based drug-susceptibility testing. User-friendly and accurate resistance prediction tools are needed to enable public health and clinical practitio...
Article
Full-text available
Shigella sonnei is the most common agent of shigellosis in high-income countries, and causes a significant disease burden in low- and middle-income countries. Antimicrobial resistance is increasingly common in all settings. Whole genome sequencing (WGS) is increasingly utilised for S. sonnei outbreak investigation and surveillance, but comparison o...
Preprint
Full-text available
Background: Short-read whole genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences, and sequencing bias, reduce the performance of variant calling using short-read alignment, but the loss in recall and specificity has not been adequately characterized...
Preprint
Full-text available
Introduction Multidrug-resistant Mycobacterium tuberculosis ( Mtb ) is a significant global public health threat. Genotypic resistance prediction from Mtb DNA sequences offers an alternative to laboratory-based drug-susceptibility testing. User-friendly and accurate resistance prediction tools are needed to enable public health and clinical practit...
Preprint
Full-text available
Bedaquiline (BDQ) and clofazimine (CFZ) are core drugs for treatment of multidrug resistant tuberculosis (MDR-TB), however, our understanding of the resistance mechanisms for these drugs is sparse which is hampering rapid molecular diagnostics. To address this, we employed a unique approach using experimental evolution, protein modelling, genome se...
Preprint
Full-text available
The open sharing of genomic data provides an incredibly rich resource for the study of bacterial evolution and function, and even anthropogenic activities such as the widespread use of antimicrobials. Whilst these archives are rich in data, considerable processing is required before biological questions can be addressed. Here, we assembled and char...
Article
Full-text available
Tuberculosis (TB) is an ancient disease affecting a plethora of domestic and wild animals, including humans. In primates, TB can cause severe multisystemic disease. The prevalence of TB in lemurs within Madagascar is unknown; the most recent documented case occurred in 1973 (1). Reverse zoonotic transmission of TB can occur when nonhuman primates a...
Preprint
Full-text available
Background: Standard approaches to characterising genetic variation revolve around mapping reads to a reference genome and describing variants in terms of differences from the reference; this is based on the assumption that these differences will be small and provides a simple coordinate system. However this fails, and the coordinates break down, w...
Preprint
Full-text available
Background Vietnam has high rates of antimicrobial resistance (AMR) but limited capacity for genomic surveillance. This study used whole genome sequencing (WGS) to examine the prevalence and transmission of three key AMR pathogens in two intensive care units in Hanoi, Vietnam. Methods A prospective surveillance study of all adults admitted to inte...
Preprint
Full-text available
Background Bacterial genomes follow a U-shaped frequency distribution whereby most genomic loci are either rare (accessory) or common (core); the union of these is the pan-genome. The alignable fraction of two genomes from a single species can be low (e.g. 50-70%), such that no single reference genome can access all single nucleotide polymorphisms...
Preprint
Full-text available
Shigella sonnei is the most common agent of shigellosis in high-income countries, and causes a significant disease burden in low-and middle-income countries. Antimicrobial resistance is increasingly common in all settings. Whole genome sequencing (WGS) is increasingly utilised for S. sonnei outbreak investigation and surveillance, but comparison of...
Article
Full-text available
The characterization of de novo mutations in regions of high sequence and structural diversity from whole-genome sequencing data remains highly challenging. Complex structural variants tend to arise in regions of high repetitiveness and low complexity, challenging both de novo assembly, in which short reads do not capture the long-range context req...
Article
Two billion people are infected with Mycobacterium tuberculosis, leading to 10 million new cases of active tuberculosis and 1.5 million deaths annually. Universal access to drug susceptibility testing (DST) has become a World Health Organization priority. We previously developed a software tool, Mykrobe predictor, which provided offline species ide...
Article
Full-text available
Two billion people are infected with Mycobacterium tuberculosis , leading to 10 million new cases of active tuberculosis and 1.5 million deaths annually. Universal access to drug susceptibility testing (DST) has become a World Health Organization priority. We previously developed a software tool, Mykrobe predictor , which provided offline species i...
Chapter
We present COBS, a COmpact Bit-sliced Signature index, which is a cross-over between an inverted index and Bloom filters. Our target application is to index k-mers of DNA samples or q-grams from text documents and process approximate pattern matching queries on the corpus with a user-chosen coverage threshold. Query results may contain a number of...
Preprint
Full-text available
The characterization of de novo mutations in regions of high sequence and structural diversity from whole genome sequencing data remains highly challenging. Complex structural variants tend to arise in regions of high repetitiveness and low complexity, challenging both de novo assembly, where short-reads do not capture the long-range context requir...
Preprint
We present COBS, a compact bit-sliced signature index, which is a cross-over between an inverted index and Bloom filters. Our target application is to index $k$-mers of DNA samples or $q$-grams from text documents and process approximate pattern matching queries on the corpus with a user-chosen coverage threshold. Query results may contain a number...
Article
New antibiotics are urgently needed to combat rising rates of resistance against all existing classes of antimicrobials. We highlight key issues that complicate the prediction of resistance evolution in the real world and outline the ways in which these can be overcome.
Article
Full-text available
The clinical phenotype of zoonotic tuberculosis and its contribution to the global burden of disease are poorly understood and probably underestimated. This shortcoming is partly because of the inability of currently available laboratory and in silico tools to accurately identify all subspecies of the Mycobacterium tuberculosis complex (MTBC). We p...
Article
Full-text available
Exponentially increasing amounts of unprocessed bacterial and viral genomic sequence data are stored in the global archives. The ability to query these data for sequence search terms would facilitate both basic research and applications such as real-time genomic epidemiology and surveillance. However, this is not possible with current methods. To s...
Article
Full-text available
BACKGROUND The World Health Organization recommends drug-susceptibility testing of Mycobacterium tuberculosis complex for all patients with tuberculosis to guide treatment decisions and improve outcomes. Whether DNA sequencing can be used to accurately predict profiles of susceptibility to first-line antituberculosis drugs has not been clear. METHO...
Article
Full-text available
Background : In principle, whole genome sequencing (WGS) can predict phenotypic resistance directly from genotype, replacing laboratory-based tests. However, the contribution of different bioinformatics methods to genotype-phenotype discrepancies has not been systematically explored to date. Methods : We compared three WGS-based bioinformatics meth...
Article
Full-text available
Colistin represents one of the few available drugs for treating infections caused by carbapenem-resistant Enterobacteriaceae. As such, the recent plasmid-mediated spread of the colistin resistance gene mcr-1 poses a significant public health threat, requiring global monitoring and surveillance. Here, we characterize the global distribution of mcr-1...
Article
Full-text available
Motivation: The de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter k, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables re...
Article
Full-text available
This study aimed to assess the feasibility of using the Oxford Nanopore Technologies (ONT) MinION long-read sequencer in reconstructing fully closed plasmid sequences from eight Enterobacteriaceae isolates of six different species with plasmid populations of varying complexity. Species represented were Escherichia coli, Klebsiella pneumoniae, Citro...
Article
Full-text available
Purpose: Speed of bloodstream infection diagnosis is vital to reduce morbidity and mortality. Whole genome sequencing (WGS) performed directly from liquid blood culture could provide single-assay species and antibiotic susceptibility prediction; however, high inhibitor and human cell/DNA concentrations limit pathogen recovery. We develop a method...
Preprint
Full-text available
The clinical phenotype of zoonotic tuberculosis, its contribution to the global burden of disease and prevalence are poorly understood and probably underestimated. This is partly because currently available laboratory and in silico tools have not been calibrated to accurately identify all subspecies of the Mycobacterium tuberculosis complex ( Mtbc...
Data
For all ontologies showing enrichment in within-patient BD-class variants, we identified the genes with variants contributing to the signal. We counted the number of protein-altering variants in these genes within patients, and compared to the number in long-term asymptomatic carriers. p-Values calculated using Fisher’s exact test. *Variant totals...
Article
Full-text available
Bacteria responsible for the greatest global mortality colonize the human microbiota far more frequently than they cause severe infections. Whether mutation and selection among commensal bacteria are associated with infection is unknown. We investigated de novo mutation in 1163 Staphylococcus aureus genomes from 105 infected patients with nose colo...
Data
List of all variants found within patients with S. aureus infections, location on shared reference (MRSA252), or position and reference genome name and accession number if variant could not be localized on MRSA252. Each variant is described by the alleles found, its location in gene, the predicted effect on gene product and the location of the vari...
Data
List of all cultures included in the site, the site of infection (and any known source if bloodstream), number of isolates sequenced from each site, ST or CC by in silico MLST, number of variants found at each site and the mean pair-wise difference comparing isolates.
Data
Neutrality indices show signals of adaptation among the genes, gene ontologies and expression pathways most significantly enriched for protein-altering B-class variants. Neutrality indices (NIs, 41,42) were calculated as the odds ratio of the number of protein-altering to synonymous variants among B-class versus C/D-class variants. These tests are...
Data
List of all variants found within long term asymptomatic carriers, location on shared reference (MRSA252), or position and reference genome name and accession number if variant was not localized on MRSA252. Each variant is described by the alleles found, its location in gene and the predicted effect on gene product.
Preprint
Full-text available
Genome sequencing of pathogens is now ubiquitous in microbiology, and the sequence archives are effectively no longer searchable for arbitrary sequences. Furthermore, the exponential increase of these archives is likely to be further spurred by automated diagnostics. To unlock their use for scientific research and real-time surveillance we have com...
Article
Full-text available
Motivation: Correct and rapid determination of Mycobacterium tuberculosis (MTB) resistance against available tuberculosis (TB) drugs is essential for the control and management of TB. Conventional molecular diagnostic test assumes that the presence of any well-studied single nucleotide polymorphisms is sufficient to cause resistance, which yields...