ArticlePDF Available

A data mining approach to Spacer Oligonucleotide Typing of Mycobacterium tuberculosis

Authors:
  • St. Petersburg Pasteur Institute

Abstract and Figures

Motivation: The Direct Repeat (DR) locus of Mycobacterium tuberculosis is a suitable model to study (i) molecular epidemiology and (ii) the evolutionary genetics of tuberculosis. This is achieved by a DNA analysis technique (genotyping), called sp acer oligo nucleotide typing (spoligotyping ). In this paper, we investigated data analysis methods to discover intelligible knowledge rules from spoligotyping, that has not yet been applied on such representation. This processing was achieved by applying the C4.5 induction algorithm and knowledge rules were produced. Finally, a Prototype Selection (PS) procedure was applied to eliminate noisy data. This both simplified decision rules, as well as the number of spacers to be tested to solve classification tasks. In the second part of this paper, the contribution of 25 new additional spacers and the knowledge rules inferred were studied from a machine learning point of view. From a statistical point of view, the correlations between spacers were analyzed and suggested that both negative and positive ones may be related to potential structural constraints within the DR locus that may shape its evolution directly or indirectly. Results: By generating knowledge rules induced from decision trees, it was shown that not only the expert knowledge may be modeled but also improved and simplified to solve automatic classification tasks on unknown patterns. A practical consequence of this study may be a simplification of the spoligotyping technique, resulting in a reduction of the experimental constraints and an increase in the number of samples processed.
Content may be subject to copyright.
... This technology has gelled achieve great advancement towards the diagnosis of conditions such as Alzheimer's disease, multiple sclerosis, tumors and others. However, a major drawback that MRI imaging suffers from is the presence of noise in the surrounding environment of the region of interest, acquisition of noise from MRI equipment, noise due to background tissue presence, breathing motion of patients, presence of body fat in the area and many such shortcomings [52]. The various types of noise may lead put a limit on the efficiency of MRI scanning techniques and prevent them from achieving their true potential. ...
... These algorithms widely used in literature [51,52]. Besides these, several other algorithms such as KNN algorithm [53], KMeans algorithm [54], COVR-AHC algorithm [55] and several other methods can be used for forming groups as clusters. ...
Chapter
Customer all over the world gives the ratings and reviews for the services they are using. Internet web sites (IWS) are a good platform for the customer to share their views regarding the offered services. These reviews and ratings are very useful for both customer and service provider, business-related purposes as it increases sales and is also useful for customers as it acts as a source of information. IWS makes it easy to share and access the data, but the presence of a huge amount of data makes it difficult to analyze, so the machine learning (ML) technique is developed to analysis, prediction, and recommendations. In this chapter, we have collected ratings on air transport management given by customers from different sites. There are ratings on seat comfort, cabin staff, food beverage, inflight entertainment and many more, which is further combined to give the overall rating through which recommendation is done. We have used different ML techniques to find out the overall sentiments generated by the customers on different service aspects and give the most suitable recommendation to customers. This helps customers as in travelers in decision making based on service type. We have basically compared Random tree and Decision tree ML techniques for recommendation prediction. In this chapter, we have been used WEKA as a tool to apply these ML techniques. Finally, the accuracy of the result is calculated using precision, recall, and F-scores.
... This technology has gelled achieve great advancement towards the diagnosis of conditions such as Alzheimer's disease, multiple sclerosis, tumors and others. However, a major drawback that MRI imaging suffers from is the presence of noise in the surrounding environment of the region of interest, acquisition of noise from MRI equipment, noise due to background tissue presence, breathing motion of patients, presence of body fat in the area and many such shortcomings [52]. The various types of noise may lead put a limit on the efficiency of MRI scanning techniques and prevent them from achieving their true potential. ...
... These algorithms widely used in literature [51,52]. Besides these, several other algorithms such as KNN algorithm [53], KMeans algorithm [54], COVR-AHC algorithm [55] and several other methods can be used for forming groups as clusters. ...
Chapter
Full-text available
The feature selection method plays a crucial role in text classification to minimizing the dimensionality of the features and accelerating the learning process of the classifier. Text classification is the process of dividing a text into different categories based on their content and subject. Text classification techniques have been applied to various domains such as medical, political, news, and legal domains, which show that the adaptation of domain-relevant features could improve the classification performance. Despite the existence of plenty of research work in the area of classification in several languages across the world, there is a lack of such work in Urdu due to the shortage of existing resources. In this paper, First, we present a proposed hybrid feature selection approach (HFSA) for text classification of Urdu news articles. Second, we incorporate widely used filter selection approaches along with Latent Semantic Indexing (LSI) to extract essential features of Urdu documents. The hybrid approach tested on the Support Vector Machine (SVM) classifier on Urdu “ROSHNI” dataset. The evaluated results were used to compare with the results obtained by individual filter feature selection methods. Also, the approach is compared to the baseline feature selection method. The proposed approach results show a better classification with promising accuracy and better efficiency.
... Kamal J. AbuHassan et.al [38] study the automated diagnosis of tuberculosis disease supported plasmonic ELISA and color-based image classification. They develop a mobilebased point-of-care (POC) platform for TB diagnosis. ...
Article
Full-text available
Data analysis and inferences from medical data system where the data growth is unpredictable in sizes, physicians and medical researchers can face issue in analysis and handling the data due to its increasing volume and variety. Hence traditional analysis or statistical analysis has become insufficient and its method of knowledge mining that incorporate tasks such as Knowledge extraction, data archaeology, data exploration, data pattern processing, data dredging, information harvesting, and other related techniques of data discovery in databases and data analysis required by the intelligence. Diseases are sometimes difficult to spot due to similarities of symptoms or other reasons. For example, if the diseases are unable to diagnosis because of the ambiguity symptoms, irregular sample collection, and medical errors. This review article discusses a key element in data processing and aims to elaborate a process discovered for better Knowledge Discovery in sharing special symptoms about the occurrence of other diseases such as-tuberculosis (in short called as TB). Further, by Classification Rule, Decision Tree algorithm, and Prediction Tool to infer the defining characteristics, researchers try to supply information relevance of various test components and hence to discover hidden knowledge, unexpected patterns, and new rules from the database. The aim is to identify a new way of data mining processing which has strong impacting reasons to improve further research in Tuberculosis and Data Mining Techniques.
... In the last decade, researchers have improved methods for the development of high-throughput computational algorithms which extract biologically meaningful information from genomic and proteomic datasets whose increasingly complex and extensive nature challenges traditional methods 9,10 . Data mining techniques provide efficient and effective tools to observe and analyze large volumes of data by enabling elucidation of important patterns and correlations which may ultimately reveal the underlying mechanisms of biological function or disease [11][12][13] . Techniques within the artificial intelligence/machine learning and statistics realms paired with OPEN ...
Article
Full-text available
Serological diagnosis of active tuberculosis (TB) is enhanced by detection of multiple antibodies due to variable immune responses among patients. Clinical interpretation of these complex datasets requires development of suitable algorithms, a time consuming and tedious undertaking addressed by the automated machine learning platform MILO (Machine Intelligence Learning Optimizer). MILO seamlessly integrates data processing, feature selection, model training, and model validation to simultaneously generate and evaluate thousands of models. These models were then further tested for generalizability on out-of-sample secondary and tertiary datasets. Out of 31 antigens evaluated, a 23-antigen model was the most robust on both the secondary dataset (TB vs healthy) and the tertiary dataset (TB vs COPD) with sensitivity of 90.5% and respective specificities of 100.0% and 74.6%. MILO represents a user-friendly, end-to-end solution for automated generation and deployment of optimized models, ideal for applications where rapid clinical implementation is critical such as emerging infectious diseases.
... To extract some meaningful information and interpret the results, high throughput computational algorithms have been developed (Fojnica et al., 2016;Dande and Samant, 2018). In bioinformatics, data mining is a process of extracting useful information deep inside of large datasets (Sebban et al., 2002;Zheng et al., 2008;Li et al., 2013). These techniques also involve artificial intelligence, statistics, machine learning, and visualization (Li et al., 2013;Dande and Samant, 2018). ...
Article
Full-text available
Background: The global burden of tuberculosis (TB) and antibiotic resistance is attracting the attention of researchers to develop some novel and rapid diagnostic tools. Although, the conventional methods like culture are considered as the gold standard, they are time consuming in diagnostic procedure, during which there are more chances in the transmission of disease. Further, the Xpert MTB/RIF assay offers a fast diagnostic facility within 2 h, but due to low sensitivity in some sample types may lead to more serious state of the disease. The role of computer technologies is now increasing in the diagnostic procedures. Here, in the current study we have applied the artificial neural network (ANN) that predicted the TB disease based on the TB suspect data. Methods: We developed an approach for prediction of TB, based on an ANN. The data was collected from the TB suspects, guardians or care takers along with samples, referred by TB units and health centers. All the samples were processed and cultured. Data was trained on 12,636 records of TB patients, collected during the years 2016 and 2017 from the provincial TB reference laboratory, Khyber Pakhtunkhwa, Pakistan. The training and test set of the suspect data were kept as 70 and 30%, respectively, followed by validation and normalization. The ANN takes the TB suspect's information such as gender, age, HIV-status, previous TB history, sample type, and signs and symptoms for TB prediction. Results: Based on TB patient data, ANN accurately predicted the Mycobacterium tuberculosis (MTB) positive or negative with an overall accuracy of >94%. Further, the accuracy of the test and validation were found to be >93%. This increased accuracy of ANN in the detection of TB suspected patients might be useful for early management of disease to adopt some control measures in further transmission and reduce the drug resistance burden. Conclusion: ANNs algorithms may play an effective role in the early diagnosis of TB disease that might be applied as a supportive tool. Modern computer technologies should be trained in diagnostics for rapid disease management. Delays in TB diagnosis and initiation treatment may allow the emergence of new cases by transmission, causing high drug resistance in countries with a high TB burden.
... The spacer oligonucleotides were derived from 37 spacers M. tuberculosis (Kamerbeek et al., 1997) In 105 The X family of strains is characterized by the absence of spacer 18 in the spoligotyping (Sebban et al., 2002). The present family in North America and Central America could be linked to Anglo-Saxon origin but was also currently correlated with African-Americans (Dale et al., 2003). ...
Thesis
Full-text available
The present study was designed to investigate the incidence of TB, rapid molecular diagnosis associated with drug resistant in comparison with conventional method, and the genetic diversity of TB strains circulating in Iraq. In the period from 1st of February 2015 to 1st of August 2015, 1173 clinical specimens were collected from patients attending the Specialized Chest and Respiratory Disease Center /National Reference Laboratory (NRL) for Tuberculosis in Baghdad. These specimens include 909 specimens of pulmonary TB and 264 specimens of extrapulmonary TB cases. The results revealed the following: • The results of direct microscopically examination using Zeihl - Nelsen’s (ZN) stain for all clinical specimens followed by culturing on Lowenstein- Jensen’s medium (L.J) showed that 193(16.5%) specimens were positive by direct examination while 253(21.6%) specimens were positive by culturing on L.J medium, most of culture positive specimens were sputum 220/ 253(87%). From 253 culture samples, most of TB patients who attending NRL were from Baghdad city 170 (67.2%). Males were more affected 158 (62.5%) than females 95(37.5%). Most patients had belonged to the age group 25-34 years (n=6, 25.7%) while the lowest age group was <14 years (n=6, 2.4%) with highly significant difference among the results above (p<0.01). For identification M. tuberculosis, 158 sputum isolates were taken only from positive cultured on L.J specimens and were subjected for 5 biochemical testing, the results of these tests showed that the high percentage of isolates (98.5%) belong to M. tuberculosis and (1.5%) of isolates gave variable reactions. The results of Drug Susceptibility Test (DST) against, first - line drugs showed that 42 (26.6%) isolates were sensitive to all drugs, and 116 II (73.4%) were drug resistant isolates that included: 23 (14.6%) isolates were mono- drug resistance, 17(10.7%) were poly- drug resistance, while multidrug resistance (MDR) was observed in 76(48.1 %) isolates. • For rapid molecular diagnosis of M. tuberculosis, Line Probe Assays (LPAs) were performed directly on 158 sputum smear specimens and also on 158 culture positive of these isolates grown from 10 smear-negative and 148 smear-positive specimens by using GenotypeMTBC assay followed by Genotype MTBDRplus assay. Both assays gave result within 2 days comparison with conventional methods that required at least 12 week to give result. The first assay was used for identification and differentiation members of M. tuberculosis complex, this assay identified M. tuberculosis in 152(96.2%) isolates directly from Acid-Fast Bacilli (AFB) smear specimens and also from culture, while 6(3.8%) isolates were negative results and identified as non- tuberculous mycobacteria (NTM), thus M. tuberculosis was only species of M. tuberculosis complex circulating among 158 TB patients. When comparing the GenoType MTBC assay with biochemical testes, this test was superior to biochemical testes that cannot give accurate results and cannot differentiate M. tuberculosis from other Mycobacterium species. The second assay (GenoTypeMTBDRplus) was used for detection of M. tuberculosis and it is resistance to Rifampicin (RIF) and Isoniazid (INH), and also used for detection of NTM. This assay had sensitivity (100%) for detection M. tuberculosis from culture and (97.4%) directly from AFB smear specimens (98.6% from AFB smear positive specimens versus 71.4% from AFB smear negative specimens). In comparison GenoTypeMTBDRplus assay with DST method, GenoTypeMTBDRplus assay identified 42/158(26.6%) isolates as sensitive to both rifampicin (RIF) and isoniazid (INH) while 110/158(69.6%) isolates as drug resistance included (77 MDR, 14 INH-resistance, and 19 RIF-resistance isolates), remaining 6 /158(3.8%) isolates III were identified as NTM with no specific pattern of drug resistance. The difference was statistically not significant between GenoType MTBDRplus assay and DST assay. Out of 110 drug resistant isolates, the most common mutation conferring RIF resistance was at codon S531L of rpoB gene that was identified in 76 (69.1%) isolates, further mutations were observed in this gene included: mutation at codon H526D in 8(7.3%) isolates, H526Y in 6(5.4%) isolates, D516V was identified in a single isolate (0.9%), and unknown mutations in 5 (4.5%) isolates. Similarly, INH resistance due to KatG mutation S315T was identified in 65(59.1%) isolates, while INH- resistance due to inhA promoter region of mutation C15T was identified in 35 (31.8%) isolates. • The current study was the first study in Iraq applying genotyping by using spoligotyping / Specialized Chest and Respiratory Disease Center/ National Reference Laboratory thus; this genotyping was adopted in ministry of health/ NRL for tuberculosis in Baghdad. Spoligotyping was performed for 77 MDR isolates (directly from AFB smear isolates and also from culture isolates). The results showed that 2(2.6%) isolates given negative and excluded from this study, and the data were available for 75 isolates. The results showed 11 lineages and 6 families, the most predominant lineage was Delhi/ CAS (n=27, 36%), while the T and CAS were the main families with 23(30.7%) and 22(29.3%) respectively. Spoligotyping yielded 36 patterns, 49(65.3%) isolates grouped in 10 clusters (ranging from 2 to 14 strains per cluster with the recent transmission rate 52%), while 26 (34.7%) isolates were unclustered or unique patterns (one strain only); 16 of these 26 unique patterns corresponded to orphan. IV The results showed that the Spoligotype International Type 1144 (SIT1144) formed the largest cluster (n=14, 18.6%), and according to an international database named SITVIT2 of Pasteur Institute, Iraq is unique in having, its own, most predominant strain (SIT1144). The results showed two strains of SIT309 and SIT1916 which are globally rare strains, other strains were also found in a single such as SIT205 and SIT356 which are prevalent worldwide but were not reported in Iraq previously. In addition, 3(4%) strains classified, as unknown therefore could be new strains or new genotypes which are not reported worldwide.
Experiment Findings
Full-text available
The Innate Immune Response to Mycobacterium Tuberculosis is Dependent on Strain Lineage and on Host Population publication description. Google Book : The Innate Immune Response to Mycobacterium Tuberculosis is Dependent on Strain Lineage and on Host Population. https://books.google.com.et/books/about/The_Innate_Immune_Response_to_Mycobacter.html?id=LsdUnQAACAAJ&redir_esc=y publication description : The genome structure of Mycobacterium tuberculosis is strongly clonal, in the absence of horizontal gene transfer. Thus it is feasible that clonal lineages may exhibit particular phenotypic characteristics, which may, in turn, result in differences in virulence or influence their association with particular host populations. Indeed, the global distribution of M. tuberculosis strains is not uniform and certain strain lineages predominate in particular geographical areas. Further, there is evidence that some strain lineages are emerging, suggesting differences in virulence. Firstly, we investigated the association between strain genotype of M. tuberculosis and in vitro correlates of virulence such as growth phenotype and cytokine induction in the monocyte-derived macrophage (MDM) model. Secondly, in order to study the interaction between host genetic background and the innate immune response to different strains of M. tuberculosis we conducted a cross sectional study comparing cytokine responses to in vitro infection of healthy donor monocyte derived macrophages (MDM) from individuals from different population groups with strains from different M. tuberculosis lineages. The inflammatory cytokines TNF, IL-12p40, IL-6, IL-1b and GMCSF were secreted at higher levels in response to infection with lineage 4 and lineage 3 strains as compared to lineage 2 and lineage 1 strains. Principal component analysis and linear modeling identified three inflammatory cytokines (IL-6, IL-12p40, and GM-CSF) to be differentially secreted in all four-population groups. All research datas' are analyzed by Rajesh Sarkar, myself as per guidance of UCT statistic department and expert acknowledged in the thesis and dissertation sections. One postgraduate student from statistic department got postgraduate degree from these data analysis. In this regard, any misleading information will not be acceptable.
Book
Full-text available
The genome structure of Mycobacterium tuberculosis is strongly clonal, in the absence of horizontal gene transfer. Thus it is feasible that clonal lineages may exhibit particular phenotypic characteristics, which may, in turn, result in differences in virulence or influence their association with particular host populations. Indeed, the global distribution of M. tuberculosis strains is not uniform and certain strain lineages predominate in particular geographical areas. Further, there is evidence that some strain lineages are emerging, suggesting differences in virulence. Firstly, we investigated the association between strain genotype of M. tuberculosis and in vitro correlates of virulence such as growth phenotype and cytokine induction in the monocyte-derived macrophage (MDM) model. We report that ‘modern’ clinical M. tuberculosis isolates from Cape Town (Lineage 2 and Lineage 4 strains) exhibit both lineage-specific patterns of growth in vitro (in broth and MDM) as well as cytokine responses in MDM. Secondly, in order to study the interaction between host genetic background and the innate immune response to different strains of M. tuberculosis we conducted a cross sectional study comparing cytokine responses to in vitro infection of healthy donor MDM from individuals from different population groups with strains from different M. tuberculosis lineages. The inflammatory cytokines TNF, IL-12p40, IL-6, IL-1b and GMCSF were secreted at higher levels in response to infection with lineage 4 and lineage 3 strains as compared to lineage 2 and lineage 1 strains. Principal component analysis and linear modeling identified three inflammatory cytokines (IL-6, IL-12p40, and GM-CSF) to be differentially secreted in all four-population groups. In addition, we noted evidence of possible strain lineage-host ethnicity interactions for several cytokines. Together, these studies suggests that genetic diversity of M. tuberculosis might influence the early innate immune response during tuberculosis infection, and that lineage-specific responses may be modulated by host genetic background.
Chapter
Operational analytics improves existing operations of a firm by focusing on process improvement. It acts a business tool for resource management, data streamlining which improves productivity, employee engagement, customer satisfaction and provide investment opportunities. Crucial insights into the problem can be obtained which aids to determine key business strategy through various stages of data analysis and modeling, such as exploratory data analysis, predictive modeling, documentation and reporting. In this work, a real world dataset is considered for the study, where the sales revenue of restaurant is predicted. A second stage regression model built upon base regression models which are linear regression, ridge regression, decision tree regressor. Based on the results obtained, the following findings are reported: (i) annual sales revenue trend, (ii) food preference in cities, (iii) demand variability i.e. effect of first week and weekend, and (iv) comparison against ensemble methods in terms of prediction accuracy. This work also suggest avenues for future research in this direction.
Article
Tuberculosis is a global public health problem that is resurgent in Venezuela, with 13 thousand estimated new cases in 2018. Strains of the Mycobacterium tuberculosis RDRio, subfamily belong to the Latín American Mediterranean (LAM) family and are a major cause of TB in Rio de Janeiro, Brazil. LAM strains predominate in Venezuela, where spoligotype SIT605 is common, but surprisingly not found elsewhere. We sought to assess the presence of RDRio strains in tuberculosis patients in different regions of Venezuela and determine whether SIT605 also belongs to the RDRio subfamily. Using spoligotyping and MIRU-VNTR 24 loci, we identified 86 clinical LAM and SIT605 isolates from the Venezuelan capital Caracas and several Venezuelan states. Region of difference deletion loci RD174 and RDRio, and also IS1561 were used to identify strains of the RDRio subfamily, while IS6110 at position 932,204 and the Ag85C¹⁰³ polymorphism were used to validate SIT 605 as a LAM family strain. We found that 69.8% of the isolates were RDRío, including 94.3% of strains isolated in Caracas, 17.9% isolated in the state of Carabobo, the two strains analyzed from Delta Amacuro, and one each from Sucre, Apure and Aragua states. RDRio was in 100% of: SIT17 (LAM 2); SIT20 (LAM 1); SITs 93, 1694, 1696, 960, 1367 (LAM 5); and SITs 216 (LAM 9); but only 75% of SIT42 (LAM 9) strains. Thus, most of the LAM strains in Venezuela belong to the RDRío subfamily. SIT 605 strains, although LAM, are not in the RDRío subfamily.
Article
Full-text available
Instance-based learning algorithms are often faced with the problem of deciding which instances to store for use during generalization. Storing too many instances can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main purposes. First, it provides a survey of existing algorithms used to reduce storage requirements in instance-based learning algorithms and other exemplar-based algorithms. Second, it proposes six additional reduction algorithms called DROP1–DROP5 and DEL (three of which were first described in Wilson & Martinez, 1997c, as RT1–RT3) that can be used to remove instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 classification tasks. Of those algorithms that provide substantial storage reduction, the DROP algorithms have the highest average generalization accuracy in these experiments, especially in the presence of uniform class noise.
Article
Full-text available
This paper deals with phylogenetic relationships among a set of 90 clinical strains representative of the worldwide diversity of the Mycobacterium tuberculosis complex (Kremer et al. 1999) using eight independent genetic markers: IS6110, IS1081, the direct repeat (DR) locus, and five variable number of tandem DNA repeat loci (VNTR). In a preliminary experiment, phylogenetic trees based on single markers were constructed that led to the detection of some similarities between the VNTR-based and the spoligotyping-based phylogenetic trees. In the second step, a more global phenetic approach based on pairwise comparison of strains within each typing system was used, followed by calculations of mean genetic distances based on all the eight loci and the use of the neighbor-joining algorithm for tree reconstruction. This analysis confirmed our preliminary observations and suggested the existence of at least two new phylogeographical clades of M. tuberculosis, one defined as the ``East African–Indian family'' (EA-I), which may find its origin on the African or Asian continents, and the other as the ``Latin American and Mediterranean'' (LA-M) family. The existence of these two families was also validated by an independent phylogenetic analysis of spoligotyping on a larger set of shared types (n= 252) and further corroborated by VNTR and katG–gyrA results. The potential origin of these families of bacilli is discussed based on cattle domestication and human migration history. In conclusion, the information contained in insertion sequence and repetitive DNAs may serve as a model for the phylogenetic reconstruction of the M. tuberculosis complex.
Article
Full-text available
Storing and using specific instances improves the performance of several supervised learning algorithms. These include algorithms that learn decision trees, classification rules, and distributed networks. However, no investigation has analyzed algorithms that use only specific instances to solve incremental learning tasks. In this paper, we describe a framework and methodology, called instance-based learning, that generates classification predictions using only specific instances. Instance-based learning algorithms do not maintain a set of abstractions derived from specific instances. This approach extends the nearest neighbor algorithm, which has large storage requirements. We describe how storage requirements can be significantly reduced with, at most, minor sacrifices in learning rate and classification accuracy. While the storage-reducing algorithm performs well on several real-world databases, its performance degrades rapidly with the level of attribute noise in training instances. Therefore, we extended it with a significance test to distinguish noisy instances. This extended algorithm's performance degrades gracefully with increasing noise levels and compares favorably with a noise-tolerant decision tree algorithm.
Article
A further modification to Cover and Hart's nearest neighbor decision rule, the reduced nearest neighbor rule, is introduced. Experimental results demonstrate its accuracy and efficiency.
Article
Mycobacterial interspersed repetitive units (MIRUs) are 40–100 bp DNA elements often found as tandem repeats and dispersed in intergenic regions of the Mycobacterium tuberculosis complex genomes. The M. tuberculosis H37Rv chromosome contains 41 MIRU loci. After polymerase chain reaction (PCR) and sequence analyses of these loci in 31 M. tuberculosis complex strains, 12 of them were found to display variations in tandem repeat copy numbers and, in most cases, sequence variations between repeat units as well. These features are reminiscent of those of certain human variable minisatellites. Of the 12 variable loci, only one was found to vary among genealogically distant BCG substrains, suggesting that these interspersed bacterial minisatellite-like structures evolve slowly in mycobacterial populations.
Chapter
Statistics is a subject of many uses and surprisingly few effective practitioners. The traditional road to statistical knowledge is blocked, for most, by a formidable wall of mathematics. The approach in An Introduction to the Bootstrap avoids that wall. It arms scientists and engineers, as well as statisticians, with the computational techniques they need to analyze and understand complicated data sets.