Chemistry & Biology
Bayesian Models Leveraging Bioactivity
and Cytotoxicity Information for Drug Discovery
Sean Ekins,1,2,7,* Robert C. Reynolds,3,8Hiyun Kim,4Mi-Sun Koo,4Marilyn Ekonomidis,4Meliza Talaue,4Steve D. Paget,4
Lisa K. Woolhiser,6Anne J. Lenaerts,6Barry A. Bunin,1Nancy Connell,4and Joel S. Freundlich4,5,7,*
1Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA
2Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA
3Southern Research Institute, 2000 Ninth Avenue South, Birmingham, AL 35205, USA
4Department of Medicine, Center for Emerging and Reemerging Pathogens
5Department of Pharmacology and Physiology
New Jersey Medical School, University of Medicine and Dentistry of New Jersey, 185 South Orange Avenue, Newark, NJ 07103, USA
6Department of Microbiology, Immunology and Pathology, Colorado State University, 200 West Lake Street, Fort Collins, CO 80523, USA
7These authors contributed equally to this work
8Present address: Department of Chemistry, College of Arts and Sciences, University of Alabama at Birmingham, 1530 Third Avenue South,
Birmingham, AL 35294-1240, USA
*Correspondence: firstname.lastname@example.org (S.E.), email@example.com (J.S.F.)
Identification of unique leads represents a significant
challenge in drug discovery. This hurdle is magnified
in neglected diseases such as tuberculosis. We have
leveraged public high-throughput screening (HTS)
data to experimentally validate a virtual screening
approach employing Bayesian models built with
bioactivity information (single-event model) as well
as bioactivity and cytotoxicity information (dual-
event model). We virtually screened a commercial
library and experimentally confirmed actives with
hit rates exceeding typical HTS results by one to
two orders of magnitude. This initial dual-event
Bayesian model identified compounds with antitu-
bercular whole-cell activity and low mammalian cell
cytotoxicity from a published set of antimalarials.
The most potent hit exhibits the in vitro activity and
in vitro/in vivo safety profile of a drug lead. These
Bayesian models offer significant economies in
time and cost to drug discovery.
Modern drug discovery must be more time and cost efficient
in discovering novel therapeutics. These challenges are felt
even more significantly in the search for neglected disease
treatments, where public-private partnerships coordinate drug
discovery with very limited resources. A prime example is tuber-
culosis (TB), caused by Mycobacterium tuberculosis (Mtb),
which infects approximately one-third of the world’s population
and results in 1.7–1.8 million deaths annually (Lienhardt et al.,
2012a). New drugs active against Mtb are urgently needed to
combat a pandemic heavily affected by resistance to available
therapies and coinfection with HIV/AIDS (Nuermberger et al.,
2010). TB drug discovery is challenging, reflected in the lack of
a new TB-focused therapeutic approved in over 40 years (Gros-
set et al., 2012; Sacchettini et al., 2008). One response has been
to screen very large compound libraries (Ananthan et al., 2009;
Maddry et al., 2009; Reynolds et al., 2012), hoping to deliver
on the promise of chemical diversity (O’ Connor et al., 2012).
Phenotypic whole-cell high-throughput screens of commercial
libraries have searched for inhibitors of mycobacterial growth,
at a cost of millions of dollars, with resultant low single-digit (or
less) hit rates (Macarron et al., 2011; Magnet et al., 2010; Mak
et al., 2012; Stanley et al., 2012). The campaigns have resulted
in numerous hits, but resource constraints have limited follow-
up to the few most promising compounds and/or compound
series. Fortunately, one screen of the nonpathogenic Mycobac-
terium smegmatis unearthed a diarylquinoline hit that led to the
clinical candidate bedaquiline (Andries et al., 2005), whereas
another resulted in the early-phase candidate SQ109 (Lee
et al., 2003). Although SQ109 arose directly from a library of
congeners of the frontline drug ethambutol, high-throughput
screening (HTS) typically does not deliver a clinical candidate.
Exhaustive optimization of a screening hit must occur, initially
following whole-cell activity and then considering pharmacoki-
netics, pharmacodynamics, and safety to afford clinical candi-
dates such as PA-824 (Stover et al., 2000). The remainder of
current TB clinical trials arose from repurposing other anti-
bacterials or rediscovering antituberculars from decades ago
(Lienhardt et al., 2012b). Despite these successful efforts, the
expected failure of ?85% of clinical candidates (Ledford, 2011)
and the growth of TB drug resistance necessitate new clinical
submissions, which ultimately require the discovery of novel
existing HTS data, focusing on not just the few most promising
hits due to resource limitations but the entire data set of actives
We hypothesize that prior knowledge of Mtb actives and
inactives, combined with machine learning models, can signifi-
cantly focus compound selection and improve screening effi-
ciency (Ekins and Freundlich, 2011; Ekins et al., 2010c, 2010d,
2011), as practiced in the pharmaceutical industry (Prathipati
et al., 2008), to improve the performance of virtual screening
(Schneider, 2010). These and other cheminformatics methods
370 Chemistry & Biology 20, 370–378, March 21, 2013 ª2013 Elsevier Ltd All rights reserved
have been utilized in the TB field, although in our opinion not
to the extent as in the pharmaceutical industry (Ekins et al.,
2011). Thus, cheminformatics technologies such as virtual
screening and structure-based design have contributed to clin-
ical submissions in the pharmaceutical industry (Volarath et al.,
2007) but have yet to impact TB drug candidates (Barry et al.,
2000; Ekins and Freundlich, 2011; Ekins et al., 2010c, 2010d;
Koul et al., 2011).
An alternative cheminformatics approach to computational
screening discriminates between the user-defined actives and
inactives present in a screening data set. This approach, called
Bayesian modeling, can then be utilized in an unsupervised or
automated manner to predict the likelihood of a new molecule
(absent from the training set) being a hit (using Bayes’ theorem
described in Equation 1). We (Ekins and Freundlich, 2011; Ekins
2011; Prathipati et al., 2008) have undertaken a systematic
Bayesian machine learning modeling effort focused solely on
Mtb bioactivity. Bayesian models were developed that learn
from public efficacy data for both actives and inactives and
correlate 2D compound structural features with antitubercular
activity or lack thereof. We have consistently seen enrichment
values from 4- to 10-fold over experimental HTS (Ekins and
Freundlich, 2011; Ekins et al., 2010c, 2010d; Sarker et al.,
2012) for resultant models. Yet, prospective validation of these
models (using test molecules predicted prior to testing in vitro)
has been lacking until now. Here, we now validate these
models with the prospective identification of multiple antituber-
cular hits from a commercial library with a 14% hit rate (at least
one to two orders of magnitude greater than empirical HTS;
Ananthan et al., 2009; Gold et al., 2012; Maddry et al., 2009;
ley et al., 2012).
Drug leads must not only be efficacious but also sufficiently
noncytotoxic to mammalian cells (Langdon et al., 2010). We
have therefore created dual-event Bayesian models combining
antitubercular activity and mammalian cell cytotoxicity. We
demonstrate enhanced predictive power over models that
exclude cytotoxicity. In addition, we apply a dual-event model
to the discovery of Mtb inhibitors from a published library of anti-
malarial hits (Gamo et al., 2010), demonstrating a significant
application to drug discovery, and report a potent small-mole-
cule TB drug lead exhibiting nanomolar growth inhibition of
cultured mycobacteria and acceptable in vitro and in vivo mouse
Validation of a Bayesian Model for TB Whole-Cell
Mtb Bayesian models (Ekins and Freundlich, 2011; Ekins et al.,
2010c, 2010d; Periwal et al., 2011; Prathipati et al., 2008; Sarker
et al., 2012) have lacked critical published validation as to
their ability to prospectively predict novel actives. To prospec-
tively test the previously generated Molecular Libraries Small
Molecule Repository (MLSMR) dose-response Bayesian model
(Ekins et al., 2010c), we virtually screened a commercial library
of >25,000 compounds (Asinex, available in the Collaborative
Drug Discovery [CDD] database; Hohman et al., 2009). Prin-
cipal-component analysis (PCA) demonstrated that the com-
mercial library members occupy similar chemical space as do
the model’s actives (Figure S1 available online). The compounds
were ranked by Bayesian score (range, ?28.4 to 15.3), which
relates to the likelihood of a compound being active through
determination of its molecular features compared to the features
in themodel’s actives andinactives. The morepositive the value,
the higher the probability of being active. The top-scoring 100
compounds (Bayesian score 9.4 to 15.3) were then selected.
Ninety-nine of these were commercially available and tested
for growth inhibition of Mtb (Table S1). Fourteen of these com-
pounds exhibited an IC50(amount of compound inhibiting 50%
growth of the bacteria in culture) <25 mg/ml, affording a hit
rate of 14%. The most potent molecule (SYN 22269076),
featuring an IC50of 1.1 mg/ml or 3.2 mM (Figure 1), represents
an interesting member of the pyrazolo[1,5-a]pyrimidine class
present in the HTS data utilized to train the model (Ananthan
et al., 2009).
Dual-Event Bayesian models
Previous Mtb Bayesian models (Ekins and Freundlich, 2011;
Ekins et al., 2010c, 2010d; Periwal et al., 2011; Prathipati et al.,
2008; Sarker et al., 2012) did not account for molecule cytotox-
icity,andalthoughmachine learningmethods havebeen created
for various human toxicities (Ekins et al., 2010b; Langdon et al.,
2010; Zientek et al., 2010), these have not been combined with
bioactivity endpoints. We created a dual-event Bayesian model
merging in vitro cytotoxicity (CC50) data for the compounds
with Mtb dose-response data (Ananthan et al., 2009; Maddry
et al., 2009) used in our previous studies. We selected for active
(IC90< 10 mg/ml) and noncytotoxic molecules (selectivity index;
SI = CC50/IC90> 10) to construct the dual-event Bayesian model
(Southern Research Institute-MLSMR dose-response and cyto-
toxicity model). Thus, this model has learned what molecular
Figure 1. The Validation of a Single-Event Bayesian Model for Mtb
Chemical structure of the most active compound (IC50= 1.1 mg/ml), SYN
22269076, from a virtual screen of a 25,000-member commercial library. See
also Figures S1–S7 and Tables S1–S4.
Chemistry & Biology
Bayesian Models for Mtb Drug Discovery
Chemistry & Biology 20, 370–378, March 21, 2013 ª2013 Elsevier Ltd All rights reserved 371
features among the training set are consistent with Mtb growth
inhibition and lack of comparatively significant toxicity to Vero
cells. This model had a leave-one-out receiver operator charac-
teristic (ROC; a general performance measure of models, where
an ideal model has an ROC =1; Zientek et al., 2010) value of 0.86
to the previously published MLSMR single-point and dose-
response models (Ekins et al., 2010c), which have been exten-
sively retrospectively validated (Ekins and Freundlich, 2011;
seven of nine first- and second-line TB drugs (Sacchettini et al.,
2008) that were absent in the training set (Table S3). Notably,
isoniazid and pyrazinamide are prodrugs (Konno et al., 1967;
Rozwarski et al., 1998; Scorpio and Zhang, 1996) and were not
predicted to be active by the model, which only learns from
explicit chemical structures and is likely ignorant of chemical
reactivity. Using functional class fingerprint 6 (FCFP-6; each
atom is described by its six nearest neighbors; Zientek et al.,
2010) descriptors, we identify those substructure descriptors
contributing to Mtb activity and an acceptable SI (Figure S2), in-
cluding the oxazole 2-thioether, aryl/heteroaryloxyacetic acid,
and quinolone 3-carboxylic acid cores. We also note chemical
features inconsistent with both satisfactory activity and SI,
such as thiazole 2-amides, 2-substituted pyrazoles, 2-sub-
stituted benzimidazoles, N-functionalized pyrrolidines, N-aryla-
mides, and 2-substituted pyridines (Figure S3).
We subsequently generated prospective predictions for a
screen of a targeted human kinase inhibitor library against Mtb
(Reynolds et al., 2012), using the dual-event model in addition
to previously described Bayesian models omitting cytotoxicity
(Ekins et al., 2010c). Inclusion of the cytotoxicity parameter re-
sulted in a virtual screen hit rate of 12.9% for the top 1,000
compounds in this data set, compared with 3.5%–9.8% in the
absence of cytotoxicity data (Figure S4). This is over three times
the enrichment compared to the random hit rate, nearly three
times higher than the recently reported hit rate for the HTS
(5.1%) for these compounds (Reynolds et al., 2012), and 8-fold
turally diverse libraries (Ananthan et al., 2009; Maddry et al.,
These promising preliminary results prompted us to generate
a second dual-event model with Tuberculosis Antimicrobial
Acquisition and Coordinating Facility (TAACF)-CB2 library
dose-response and cytotoxicity data (Ananthan et al., 2009).
The robustness of this model was examined by the calculation
of the ROC under two conditions. First, each molecule is left
out and the training set is used to predict the missing molecule.
Second, half of the members of the training set were left out and
the model was rebuilt. This was repeated 100 times at random
(Zientek et al., 2010). The leave-one-out ROC for this model
was 0.64, and leave-out-50% 3 100 statistics were lower than
previous models (0.59) (Ekins et al., 2010c) but still acceptable,
even though the training set overlaps well with the MLSMR and
the FCFP-6 descriptors, we can identify those substructure
descriptors consistent with both activity and lack of cytotoxicity
(Figure S6) including N-alkylated imidazole, 1-amino-3-chloro-
benzene, diaminopyrimidine, 5-substituted-1,3,4-thiadiazol-2-
amine, and tetrafluorophenylamide. Features inconsistent with
both activity and lack of cytotoxicity included nitroolefin, 3-hy-
drazonoindolin-2-one, and 1-amino-2-chlorophenyl (Figure S7).
Bayesian Models Increase Efficiency of TB Drug
We have published on the potential to ‘‘pathogen hop’’ between
inhibitors of Plasmodium falciparum and Mtb (Vilche `ze et al.,
2011). The dual-event TAACF-CB2 dose-response and cytotox-
icityBayesianmodel wasusedto rankapreviously publishedset
of >13,000 potential antimalarial hits (Gamo et al., 2010), pos-
sessing low cytotoxicity and chemical diversity, that represent
chemical tools/probes being explored by the infectious disease
community. From the commercially available subset, the top
46 molecules were visually inspected and 7 were chosen as
representative (i.e., chemotypes such as furanylamide, quinazo-
line, quinoline, triazine, aminothiazole, and arylsulfonamide).
Five were identified as active against Mtb with minimum inhibi-
tory concentration (MIC) values %2 mg/ml (71% hit rate;
Table 1). Out of these five hits, three molecules represent
antitubercular chemical structures and the remaining two hits
have been reported to exhibit in vitro efficacy versus Mtb, but
without published follow-up (Bruhin et al., 1969; Reynolds
et al., 2012), making them promising starting points for drug
discovery. Interestingly, the most active compound was
zinyl)-N2,N4-diphenyl-1,3,5-triazine-2,4-diamine, with an MIC
of 0.0625 mg/ml. This compound has drug-like properties with
a molecular weight of 414.42 g/mol, 10 hydrogen-bond accep-
tors, 3 hydrogen-bond donors, 9 rotatable bonds, a total polar
surface area of 94.27 A˚2, and a calculated logP of 2.86 (Oprea
et al., 2001; Walters and Murcko, 2002).
A search of the publicly available MLSMR 215,110-compound
screening data (Maddry etal., 2009) was conducted for diamino-
triazine hydrazones with positive growth inhibition (at 10 mM
compound concentration) of cultured Mtb. One hundred and
seven molecules were found featuring 3 nitrofuryl hydrazones,
10furyl hydrazones(nonitro substitution), and 19nitrophenyl hy-
drazones. Notable among the 32 inactives were hydrazones with
aryl, nitroaryl, and furyl substituents (Table S4). These molecules
were not discussed previously (Maddry et al., 2009). A search of
the literature uncovered a single report of the antitubercular
activity of TCMDC-125802 and related triazines against Mtb
from over 40 years ago (Bruhin et al., 1969), which came to light
after choosing TCMDC-125802 from the scored antimalarials
set. It was shown to have activity against cultured Mtb strains
(Bruhin et al., 1969). However, it was reported to be inactive in
a mouse model of acute infection, as judged solely by extension
of survival compared to an untreated control. A close analog,
where the two aniline moieties were replaced with i-propylamino
groups, demonstrated potent in vitro and in vivo activity but was
qualitatively less active in vivo than the frontline drug isoniazid
(INH) (Bruhin et al., 1969).
The in vitro potency of TCMDC-125802 prompted us to re-
examine its activity and safety profiles in vitro and in vivo.
TCMDC-125802 was synthesized on a gram scale (Supple-
mental Experimental Procedures) and found to be bactericidal,
exhibiting a minimum bactericidal concentration (the concentra-
tion of compound reducing the initial bacterial load by 2 log10
units) (Saunders, 1992) of 0.25–0.5 mg/ml (Figure 2A). Over the
Chemistry & Biology
Bayesian Models for Mtb Drug Discovery
372 Chemistry & Biology 20, 370–378, March 21, 2013 ª2013 Elsevier Ltd All rights reserved
course of 3 weeks, 643 the MIC (4.0 mg/ml) afforded a 6 log10
drop in CFUs (Figure 2B). A CC50of 4.0 mg/ml with African green
monkey kidney cells (Vero cells; ATCC) was determined, repre-
senting a selectivity index (CC50/MIC) of 64. Additionally, we
determined a CC50 of 1.0 mg/ml with B6D2F1 mouse bone
marrow-derived macrophages. The cytotoxicity data may be
compared with that previously reported (5% growth inhibition
of HepG2 cells in the presence of 10 mM TCMDC-125802)
(Gamo et al., 2010). Significantly, TCMDC-125802, formulated
in 0.5% methylcellulose, demonstrated no overt toxicity in
C57BL/6 mice for 7 days post 3 days’ dosing (30, 100, and
300 mg/kg by gavage). Subsequently, TCMDC-125802 was
examined in a standard mouse model of acute Mtb infection.
This model usesthe highly susceptible g-interferon-gene disrup-
ted (GKO) C57BL/6 mouse (Lenaerts et al., 2003). Eight-to ten-
week-old female specific pathogen-free C57BL/6-Ifngtm1ts
(GKO) mice (Jackson Laboratories) were infected with Mtb
Erdman strain via low-dose aerosol exposure.
Thirteen days postinfection, the mice were administered
TCMDC-125802 300 mg/kg by gavage daily for 9 consecutive
days. One day postcessation of TCMDC-125802 dosing, the
animals were euthanized. TCMDC-125802 was not found to
reduce the bacillary load in mouse lungs and spleens as com-
pared to the untreated control (Table S5).
Machine learning using Bayesian models has previously focused
on Mtb activity and excluded cytotoxicity data (Ekins and
Freundlich, 2011; Ekins et al., 2010c, 2010d; Periwal et al.,
2011; Prathipati et al., 2008; Sarker et al., 2012). We now provide
experimental validation of such models through the prospective
Compound TCMDC No. Chemical StructureBayesian ScoreMIC (mg/ml) % Inhibition HepG2 at 10 mM Compound
The compound TCMDC number was assigned by GlaxoSmithKline, as detailed in their publication (Gamo et al., 2010). The Bayesian score was calcu-
lated utilizing the TAACF-CB2 dose-response and cytotoxicity model. The MIC for each compound was determined versus Mtb H37Rv. The HepG2
cytotoxicities were obtained from Gamo et al. (2010).
Chemistry & Biology
Bayesian Models for Mtb Drug Discovery
Chemistry & Biology 20, 370–378, March 21, 2013 ª2013 Elsevier Ltd All rights reserved 373
prediction of actives and the demonstration of their antituber-
cular efficacy with a hit rate of 14%. Finding new hits is, there-
fore, greatly enhanced over current screening methods, as
typical experimental HTS success rates are less than 1% (Payne
et al., 2007) and for Mtb growth-inhibition screens they are
similar (Ananthan et al., 2009; Gold et al., 2012; Maddry et al.,
2009; Magnet et al., 2010; Mak et al., 2012; Reynolds et al.,
2012; Stanley et al., 2012). Interestingly, many of the prospective
hits from a vendor library exhibited a pyrazolo[1,5-a]pyrimidine
core, and five actives were found with similar (but not identical)
substructures from a kinase library screen (Reynolds et al.,
2012). All exhibited acceptable SI values of >10. The most active
compound, SYN 22269076 (Figure 1), is notably absent from any
of the public TB data sets, suggesting it may be promising to
perform hit-to-lead optimization on this compound series.
An interesting aspect of the dual-event machine learning
models reported herein lies in their combination of bioactivity
and cytotoxicity features. This report describes this strategy, re-
sulting in the selection of active molecules, such as TCMDC-
125802, with promising efficacy and cytotoxicity profiles at a
much higher success rate than with traditional HTS. The ability
to identify noncytotoxic hits is significant because HTS cam-
paigns often fail to find whole-cell actives devoid of cytotoxicity
(Payne et al., 2007). Toxicity-related events lead to more than
one-third of all clinical trial failures and 90% of withdrawals of
approved drugs (Schuster et al., 2005). Although learning from
large cytotoxicity data sets has been described previously by
us (Ekins et al., 2010b) and others (Langdon et al., 2010), this
has not been applied in the TB field. A dual-event Bayesian
model was utilized to score an antimalarial library (Gamo et al.,
2010), and the top 23 molecules that were commercially avail-
able contained only 5 molecules (21.7%) with significant cyto-
toxicity (R40% HepG2 growth inhibition at 10 mM compound).
It is important to note that this library was biased toward
compounds with relatively low cytotoxicity (Gamo et al., 2010).
A performance comparison among our models as to their ability
to identify hits from a kinase-focused library demonstrates a
potential advantage for dual-event versus single-event models
(Figure S4) that warrants further study. Adding the cytotoxicity
criteria to define an active hit narrows down the number of hits
used in model training. It is also possible we are removing
spurious hits that work by multiple mechanisms, thus failing to
discriminate sufficiently between Mtb and model mammalian
Significantly, the dual-event model identified the TB drug lead
TCMDC-125802, which exhibited promising in vitro bactericidal
activity against Mtb, acceptable mammalian cellular cytotox-
icity, and in vivo mouse safety. Although the compound did not
show activity at 300 mg/kg dosing in a single mouse model of
acute infection, the value of a small molecule that is safe in vivo
and possesses an excellent in vitro activity profile is significant
for lead optimization. Additionally, the chemical similarity of it
to the corresponding di-(i-propylamino) variant with demon-
strated in vivo efficacy (Bruhin et al., 1969) should engender
confidence that novel analogs will be found to have efficacy in
will leverage existing data around the core structure published
previously (Bruhin et al., 1969) and from the MLSMR screen
(Maddry et al., 2009) (Table S4) to probe absorption, distribution,
metabolism, and excretion properties that may likely be respon-
siblefor thelack of activity ofTCMDC-125802 in the GKOmouse
model. Additionally, none of the previous Mtb machine learning
studies has derived such an active antitubercular with demon-
strated in vivo safety.
Future efforts will seek to discern the Mtb target(s) of TCMDC-
125802 through ongoing mechanistic studies. It may share
a common mechanism with nitrofurantoin (Tanimoto similarity
0.68; an approved antibacterial for uncomplicated urinary tract
infections [Garau, 2008] with modest activity (MIC = 12 mg/ml)
against Mycobacterium bovis BCG [Murugasu-Oei and Dick,
2000]), which undergoes bacteria-induced reduction of the
nitro group via one or more nitroreductases (Whiteway et al.,
1998) to a toxic nitroso or hydroxylamine derivative (Sandegren
et al., 2008). One must also consider nitroimidazoles such as
PA-824 (Stover et al., 2000) (Tanimoto similarity 0.63), which
exhibits anaerobic activity via the release of reactive nitrogen
species and aerobic efficacy through an undetermined mecha-
nism (Singh et al., 2008).
dual-event Bayesian models for whole-cell Mycobacterium
tuberculosis activity to accelerate the discovery of novel
hits and leads will be readily translated to other therapeutic
Figure 2. The InVitroEfficacy ProfileofTCMDC-125802Identifiedby
a Dual-Event Bayesian Model for Mtb Efficacy and Cytotoxicity
(A) Minimum bactericidal concentration determination through quantification
of Mtb CFUs as modulated by various compound concentrations. Error bars
denote standard deviations.
(B) Killing kinetics examined through the time dependence of Mtb CFUs in the
presence of TCMDC-125802. Error bars denote standard deviations.
See also Table S5.
Chemistry & Biology
Bayesian Models for Mtb Drug Discovery
374 Chemistry & Biology 20, 370–378, March 21, 2013 ª2013 Elsevier Ltd All rights reserved
areas. In so doing, we expect to achieve further enhance-
ments through strategies such as consensus modeling
(Ganguly et al., 2006) and combining data sets. The detailed
study of the effect of training sets and model parameters on
active enrichment may lead to multievent Bayesian models
that focus on compound attributes important for in vitro
and/or in vivo efficacy (Ekins et al., 2010a). Because drug
discovery is time intensive and very costly, machine learning
approaches can increase the efficiency of screening and
should be implemented prior to future high-throughput
screening campaigns (Ballell et al., 2005; Nathan, 2011;
Sacchettini et al., 2008). These campaigns represent multi-
million dollar investments that will be more fully utilized
through Bayesian models generated from previously gener-
ated data. This will also spare resources for more expensive
and critical downstream studies to select candidates for
All experimental protocols were approved with written consent by the Animal
Care and Use Committee of Colorado State University (approval number
ACUC no. 12-3723A), which abides by the USDA Animal Welfare Act and
the Public Health Service Policy on Humane Care and Use of Laboratory
Compounds were purchased from Asinex, Enamine, Life Chemicals, and
Ryan Scientific and assayed as supplied without further quality assessment.
TCMDC-125802 was also synthesized (Supplemental Experimental Proce-
dures). The NIAID Southern Research Institute screening data sets (Ananthan
et al., 2009; Maddry et al., 2009; Reynolds et al., 2012), Asinex library (n >
25,000), and antimalarial compounds (n > 13,000) (Gamo et al., 2010) were
downloaded from the CDD TB database (Ekins et al., 2010c) and used for
computational analysis (Ekins and Freundlich, 2011; Ekins et al., 2010c,
Dual-Event Bayesian Model Building
Bayesian classification is a simple probabilistic classification model based on
Bayes’ theorem (Equation 1):
pðh j dÞ =Pðd j hÞPðhÞ
where h is the hypothesis or model, d is the observed data, p(h) is the prior
belief (probability of hypothesis h before observing any data), p(d) is the
data evidence (marginal probability of the data), p(djh) is the likelihood
(probability of data d if hypothesis h is true), and p(hjd) is the posterior prob-
ability (probability of hypothesis h being true given the observed data d).
Bayesian statistics take into consideration the complexity of the model as
well as the likelihood of a model, such that it automatically picks the simplest
model that can explain the observed data and prevents overfitting. In the
Bayesian modeling software within Discovery Studio (Accelrys), the learned
models are created with a learn-by-example paradigm: the user marks the
sample data that are of interest (good or active), and then the system learns
to distinguish them from background data (i.e., those that are inactive). The
learning process generates a large set of Boolean features (e.g., and, not,
or etc.) from the input descriptors, and then collects the frequency of occur-
rence of each feature in the good subset and in all data samples. To apply the
model, the features of the sample are generated, and a weight is calculated
for each feature using a Laplacian-adjusted probability estimate to account
for the different sampling frequencies of different features. The weights are
summed to provide a probability estimate, which is a relative predictor of
the likelihood of that sample being from the good subset (e.g., a more positive
Dual-event Bayesian classifier models were created for (1) the MLSMR
dose-response and cytotoxicity data (Maddry et al., 2009) for 2,273 com-
pounds (165 active with IC90< 10 mg/ml and selectivity SI > 10 for Vero cells)
and (2) the TAACF-CB2 dose-response and cytotoxicity data for 1,783
compounds (1,006 active with IC90< 10 mg/ml and selectivity SI > 10 for
Vero cells) as described (Ekins and Freundlich, 2011; Ekins et al., 2010c,
2010d) using Discovery Studio 2.5.5 (Bender et al., 2007; Hassan et al.,
2006; Klon et al., 2006; Prathipati et al., 2008; Rogers et al., 2005). Models
were validated using leave-one-out cross-validation in which each sample
was left out one at a time, a model was built using the remaining samples,
and that model was utilized to predict the left-out sample. Each model was
internally validated and ROC plots were generated, and the cross-validated
ROC area under the curve (XV ROC AUC) was calculated. All models gener-
ated were additionally evaluated by leaving out 50% of the data and rebuilding
the model 100 times using a custom protocol for validation to generate the
ROC AUC, concordance, specificity, and selectivity.
PCA in Discovery Studio 2.5.5 (Accelrys) was used to compare the molecular
descriptor space for the dose-response data for the three data sets from the
Southern Research Institute (Ananthan et al., 2009; Maddry et al., 2009;
Reynolds et al., 2012) as well to compare the actives from the MSLMR data
set and the Asinex database (using ALogP; molecular weight; number of
hydrogen-bond donors; number of hydrogen-bond acceptors; number of
rotatable bonds; number of rings; number of aromatic rings; and molecular
fractional polar surface area calculated in the software).
Molecular properties of TCMDC-125802 were determined in MolPrime
(Molecular Materials Informatics) (Clark et al., 2012). Molecular similarity was
calculated using MDL fingerprints and the Tanimoto similarity algorithm in
Discovery Studio Version 3.5.5 (Accelrys).
Mtb Growth-Inhibition Assays
Southern Research Institute
The Mtb HTS assay was modified from that previously described (Collins and
Franzblau, 1997) by using black, clear-bottom, 384-well microtiter plates and
7H12 broth. It should be noted that the initial screening concentration and final
units for the reported inhibition were driven by how the libraries were supplied
(see Ananthan et al.,2009;Maddry et al.,2009;Reynolds etal., 2012),assome
were mM and some were mg/ml.
Typically, compound stocks of 10 mg/ml in 100% DMSO were diluted in
assay media, and 25 ml aliquots of these diluted compounds were transferred
to 384-well plates. Amikacin was included in the positive-control wells in every
the approximate MIC and is an indicator of proper assay performance of each
plate. The high concentration completely inhibits growth and is used in lieu
of uninoculated medium (background) to calculate percent inhibition by
the test compounds for each plate. Plates containing test compounds
(320 compounds/plate) and positive-control compounds were transferred to
the BSL3 facility for bacteria addition and incubation. The Mtb stock H37Rv
was diluted to 1–2 3 105CFU/ml in the assay medium, Middlebrook 7H12
broth (7H9 broth supplemented with 0.1% casitone, 5.6 mg/ml palmitate,
0.5% bovine serum albumin, and 4 mg/ml catalase), and 25 ml was plated
over the compounds. Positive- and negative-control wells were included in
each plate. Amikacin was included in one of the compound wells as an internal
control in dose-response runs. Plates were placed in stacks of two inside
double low-density polyethylene bags and incubated for 7 days at 37?C with
approximately 90% humidity. After 7 days of incubation, endpoint reagent
(2 parts Alamar blue [Trek Diagnostics] + 1.5 parts 18.2% Tween 80 [Difco]
diluted in Milli-Q water) was added to all wells in a volume of 9 ml/well. The
plates were returned to the incubator for an additional 18–20 hr. Plates were
sealed and bottom read for fluorescence using a PerkinElmer Envision plate
reader at 535 nm excitation and 590 nm emission.
Each assay run contained one plate of inoculated medium (sterility control),
another plate containing inoculated medium (growth control), and a 96-well
plate with ethambutol at the approximate MIC (0.5 mg/ml) and 20 times the
MIC (10 mg/ml). In addition to fluorometric reads, these plates were read at
Chemistry & Biology
Bayesian Models for Mtb Drug Discovery
Chemistry & Biology 20, 370–378, March 21, 2013 ª2013 Elsevier Ltd All rights reserved 375
an absorbance of 615 nm (the approximate peak wavelength for oxidized dye)
and used to monitor the quality of the Alamar blue as well as adequate growth
of the organism. Expected absorbance readings were about 0.8 and 0.2 for
a good dye reagent (medium only) and growth control wells, respectively.
The ethambutol plate was used to help confirm that a contaminating organism
tible whereas other genera are resistant.
Data were analyzed using IDBS ActivityBase. Results of the single-dose
screen were expressed as percent inhibition, which was calculated as 100 3
([median cell control – high-dose control drug) – (test well – high-dose control
drug])/(median cell control – high-dose control drug). The dose-response data
were analyzed using a four-parameter logistic fit (Excel Fit Equation 205) with
the maximum and minimum locked at 100 and 0. From these curves, TB IC90
and TB IC50values were calculated for Mtb.
New Jersey Medical School, University of Medicine and Dentistry
of New Jersey
Each compound was dissolved in DMSO at a final concentration of 12 mg/ml
and serial dilutions were performed to generate test concentrations ranging
from 32 mg/ml to 0.488 ng/ml. M. tuberculosis strain H37Rv at the mid-
logarithmic stage of growth (OD580= 0.4) was diluted 1:100, and 0.1 ml was
added to each well of a 96-well plate along with 0.1 ml of test compound
solution. After 6 days of incubation at 37?C, Alamar blue (Invitrogen) reagent
was added along with 12.5 ml of 20% Tween 80 (Sigma) to evaluate bacterial
cell viability. Plates were scanned 24 hr later at 570 nm with a reference
wavelength of 600 nm utilizing a Biotek Instruments ELX 808. Inoculum control
wells of untreated H37Rv were used to create a survival-inhibition curve with
each assay. Rifampicin was used as a positive control (MIC = 0.0125 –
Minimum Bactericidal Concentration Determination for
Following a literature protocol (Xie et al., 2005), M. tuberculosis H37Rv grown
to the exponential phase (A600= 0.5) was adjusted to 5 3 105CFU/ml in 2 ml of
Middlebrook 7H9 medium supplemented with 10% albumin-dextrose-saline
(ADS), 0.5% glycerol, and 0.5% Tween 80. Bacterial cultures were incubated
at 37?C by shaking at low rpm after treatment with various concentrations of
TCMDC-125802 (0.125, 0.25, 0.5, 1.0, 2.0, and 4.0 mg/ml) in DMSO. Following
9 days of incubation, bacterial cultures were serially diluted with sterile
PBS-Tween 80 and plated on Middlebrook 7H11 plates, and CFUs were
enumerated following 21 days of incubation at 37?C. Bacterial CFUs were
represented as a mean ± standard deviation of triplicate samples per experi-
Killing Kinetics for TCMDC-125802 and INH
M. tuberculosis H37Rv was grown in 7H9 medium supplemented with
10% ADS, 0.5% glycerol, and 0.05% Tween 80 to the exponential phase
(A600= 0.5) at 37?C and adjusted to a final concentration of 2.0 3 107CFU/ml.
TCMDC-125802 or INH was dissolved in DMSO at the appropriate concentra-
tion and added to 2 3 107CFU/ml of actively growing bacterial cultures. One
hundred microliters of culture was collected at 2, 9, 14, and 21 days posttreat-
ment, 10-fold serially diluted with sterile PBS-Tween 80, and placed on
Middlebrook 7H11 agar plates (Sigma-Aldrich). Bacillary CFU was enumer-
ated after 21 days of incubation at 37?C. The killing curves were plotted using
GraphPad Prism 4 (GraphPad Software).
Isolation of Bone Marrow-Derived Macrophages
Bone marrow-derived macrophages were isolated from the femurs of 8-week-
old female B6D2F1 mice (Jackson Laboratories) as described (Ehrt et al.,
2001). Macrophages were differentiated in Dulbecco’s modified Eagle’s
medium (GIBCO) supplemented with 10% fetal bovine serum, 1% sodium
pyruvate, 1% L-glutamine, 20% L929 cell-conditioned medium, and 1% peni-
cillin and streptomycin and cultured in a 5% CO2incubator for 7 days for
Cellular Toxicity Assays
Vero cells (African green monkey kidney epithelial cells; ATCC) were plated at
1 3105cells/well in a 96-well plate and incubated for 2–3 hr to allow cells to
settle. TCMDC-125802 was dissolved in DMSO at a final concentration of
12 mg/ml. Serial dilutions were performed to generate test concentrations
ranging from 256 to 0.125 mg/ml. TCMDC-125802 was then added to plated
cells, resulting in final test concentrations of 128 to 0.0625 mg/ml. To evaluate
bacterial cell viability, 20 ml of a 1:20 MTS:PMS (Promega) reagent was added
to each well after 72 hr of incubation at 37?C and the plate was then incubated
for an additional 3 hr. The plate was scanned at an absorbance of 490 nm
utilizing a Molecular Devices SpectraMax M5 microplate reader. The CC50
was extrapolated by plotting absorbance at 490 nm versus concentration of
untreated Vero cells in control plates.
Mouse Bone Marrow-Derived Macrophages
The toxicity of TCMDC-125802 for the macrophages was determined by
the 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazoliuim bromide (MTT) cyto-
toxicity assay with the Vybrant MTT Cell Proliferation Assay Kit (Molecular
Probes). Briefly, cells were plated at 105cells/well in a 96-well plate in tripli-
cate. Following 3 hr of incubation with MTT at 37?C in a 5% CO2incubator,
the tested molecule in various concentrations was added to the cells
and the cytotoxicity determination was performed at 48 hr posttreatment.
Following 3 hr of incubation, solubilization/stop solution was added to each
well. One hour later, the absorbances at 570 nm were read with a VersaMax
ELISA microplate reader (Molecular Devices). The values were normalized to
those of medium control.
Maximum Tolerated Dose Assessment
TCMDC-125802 (formulated in 0.5% methylcellulose) was administered as
a single dose (30, 100, and 300 mg/kg) by gavage to 8-week-old C57BL/6
mice for 3 consecutive days. These mice were then observed over a period
of 7 days for adverse effects.
Acute Model of Infection
A low-dose aerosol exposure of M. tuberculosis Erdman was used to infect
12-week-old g-interferon knockout C57BL/6 mice (Lenaerts et al., 2003).
Three groups offivemice wereused:group1:positivecontrolofINHdissolved
in 1% methylcellulose (25 mg/kg); group 2: untreated negative control; and
group 3: methylcellulose (0.5%)-formulated TCMDC-125802 (synthetic mate-
rial) via oral gavage. Thirteen days postinfection, the positive-control and
drug-treated mice were administered INH and TCMDC-125802, respectively,
for 9 consecutive days (until day 21). One-day postcessation of TCMDC-
125802 dosing (day 22), the animals were euthanized. The number of viable
organismswas determined by serial dilution of organ homogenates on nutrient
Middlebrook 7H11 agar plates (Becton Dickinson). The plates were incubated
at 37?C for 4 weeks prior to the counting of viable M. tuberculosis colonies.
Supplemental Information includes seven figures, five tables, and Supple-
mental Experimental Procedures and can be found with this article online at
Accelrys is kindly acknowledged for providing Discovery Studio (to S.E.). We
acknowledge Dr. Eric L. Nuermberger (Johns Hopkins University) for valuable
discussions. The CDD TB database has been developed thanks to funding
from the Bill and Melinda Gates Foundation (grant no. 49852, ‘‘Collaborative
Drug Discovery for TB through a Novel Database of SAR Data Optimized to
Promote Data Archiving and Sharing’’). The project described was supported
by award no. R43 LM011152-01, ‘‘Biocomputation across Distributed Private
Datasets to Enhance Drug Discovery’’ from the National Library of Medicine.
R.C.R. acknowledges American Reinvestment and Recovery Act grant no.
1RC1AI086677-01 (National Institute of Allergy and Infectious Diseases
[NIAID], National Institutes of Health), ‘‘Targeting MDR-TB’’. J.S.F. acknowl-
edges funding from New Jersey Medical School, University of Medicine
and Dentistry of New Jersey (UMDNJ) and the Foundation of UMDNJ.
A.J.L. is grateful for support through TB contract no. N01 Al-95385 (NIAID
Project through Dr. Tina Parker). S.E. is a consultant for Collaborative Drug
Discovery, Inc. B.A.B. is an employee and shareholder of Collaborative Drug
Chemistry & Biology
Bayesian Models for Mtb Drug Discovery
376 Chemistry & Biology 20, 370–378, March 21, 2013 ª2013 Elsevier Ltd All rights reserved
Received: November 17, 2012
Revised: December 21, 2012
Accepted: January 3, 2013
Published: March 21, 2013
Ananthan, S., Faaleolea, E.R., Goldman, R.C., Hobrath, J.V., Kwong, C.D.,
Laughon, B.E., Maddry, J.A., Mehta, A., Rasmussen, L., Reynolds, R.C.,
et al. (2009). High-throughput screening for inhibitors of Mycobacterium
tuberculosis H37Rv. Tuberculosis (Edinb.) 89, 334–353.
Andries, K., Verhasselt, P., Guillemont, J., Go ¨hlmann, H.W., Neefs, J.M.,
Winkler, H., Van Gestel, J., Timmerman, P., Zhu, M., Lee, E., et al. (2005).
A diarylquinoline drug active on the ATP synthase of Mycobacterium
tuberculosis. Science 307, 223–227.
Ballell, L., Field, R.A., Duncan, K., and Young, R.J. (2005). New small-
molecule synthetic antimycobacterials. Antimicrob. Agents Chemother. 49,
Barry, C.E., III, Slayden, R.A., Sampson, A.E., and Lee, R.E. (2000). Use of
genomics and combinatorial chemistry in the development of new antimyco-
bacterial drugs. Biochem. Pharmacol. 59, 221–231.
Bender, A., Scheiber, J., Glick, M., Davies, J.W., Azzaoui, K., Hamon, J.,
Urban, L., Whitebread, S., and Jenkins, J.L. (2007). Analysis of pharmacology
data and the prediction of adverse drug reactions and off-target effects from
chemical structure. ChemMedChem 2, 861–873.
Bruhin, H., Bu ¨hlmann, X., Hook, W.H., Hoyle, W., Orford, B., and Vischer, W.
(1969). Antituberculosis activity of some nitrofuran derivatives. J. Pharm.
Pharmacol. 21, 423–433.
Clark, A.M., Ekins, S., and Williams, A.J. (2012). Redefining cheminformatics
with intuitive collaborative mobile apps. Mol. Inform. 31, 569–584.
Collins, L., and Franzblau, S.G. (1997). Microplate Alamar blue assay versus
BACTEC 460 system for high-throughput screening of compounds against
Mycobacterium tuberculosis and Mycobacterium avium. Antimicrob. Agents
Chemother. 41, 1004–1009.
Ehrt, S., Schnappinger, D., Bekiranov, S., Drenkow, J., Shi, S., Gingeras, T.R.,
Gaasterland, T., Schoolnik, G., and Nathan, C. (2001). Reprogramming of the
macrophage transcriptome in response to interferon-g and Mycobacterium
tuberculosis: signaling roles of nitric oxide synthase-2 and phagocyte oxidase.
J. Exp. Med. 194, 1123–1140.
Ekins, S., and Freundlich, J.S. (2011). Validating new tuberculosis computa-
tional models with public whole cell screening aerobic activity datasets.
Pharm. Res. 28, 1859–1869.
Ekins, S., Honeycutt, J.D., and Metz, J.T. (2010a). Evolving molecules using
multi-objective optimization: applying to ADME/Tox. Drug Discov. Today 15,
Ekins, S., Williams, A.J., and Xu, J.J. (2010b). A predictive ligand-based
Bayesian model for human drug-induced liver injury. Drug Metab. Dispos.
Ekins, S., Bradford, J., Dole, K., Spektor, A., Gregory, K., Blondeau, D.,
Hohman, M., and Bunin, B.A. (2010c). A collaborative database and computa-
tional models for tuberculosis drug discovery. Mol. Biosyst. 6, 840–851.
Ekins, S., Kaneko, T., Lipinski, C.A., Bradford, J., Dole, K., Spektor, A.,
Gregory, K., Blondeau, D., Ernst, S., Yang, J., et al. (2010d). Analysis and
hit filtering of a very large library of compounds screened against
Mycobacterium tuberculosis. Mol. Biosyst. 6, 2316–2324.
Ekins, S., Freundlich, J.S., Choi, I., Sarker, M., and Talcott, C. (2011).
Computational databases, pathway and cheminformatics tools for tubercu-
losis drug discovery. Trends Microbiol. 19, 65–74.
Gamo, F.-J., Sanz, L.M., Vidal, J., de Cozar, C., Alvarez, E., Lavandera, J.-L.,
Vanderwall, D.E.,Green, D.V.S., Kumar,V., Hasan, S., etal. (2010). Thousands
of chemical starting points for antimalarial lead identification. Nature 465,
Ganguly, M., Brown, N., Schuffenhauer, A., Ertl,P., Gillet, V.J., and Greenidge,
P.A. (2006). Introducing the consensus modeling concept in genetic algo-
rithms: application to interpretable discriminant analysis. J. Chem. Inf.
Model. 46, 2110–2124.
Garau, J. (2008). Other antimicrobials of interest in the era of extended-spec-
trum b-lactamases: fosfomycin, nitrofurantoin and tigecycline. Clin. Microbiol.
Infect. 14(Suppl 1), 198–202.
W.C., Warrier, T., Somersan, S., Venugopal, A.,et al. (2012). Nonsteroidal anti-
inflammatory drug sensitizes Mycobacterium tuberculosis to endogenous and
exogenous antimicrobials. Proc. Natl. Acad. Sci. USA 109, 16004–16011.
Grosset, J.H., Singer, T.G., and Bishai, W.R. (2012). New drugs for the treat-
ment oftuberculosis: hopeand reality. Int.J.Tuberc. Lung Dis.16,1005–1014.
Hassan, M., Brown, R.D., Varma-O’Brien, S., and Rogers, D. (2006).
Cheminformatics analysis and learning in a data pipelining environment.
Mol. Divers. 10, 283–299.
Hohman, M., Gregory, K., Chibale, K., Smith, P.J., Ekins, S., and Bunin, B.
(2009). Novel web-based tools combining chemistry informatics, biology and
social networks for drug discovery. Drug Discov. Today 14, 261–270.
Klon, A.E., Lowrie, J.F., and Diller, D.J. (2006). Improved naı ¨ve Bayesian
modeling of numerical data for absorption, distribution, metabolism and
excretion (ADME) property prediction. J. Chem. Inf. Model. 46, 1945–1956.
Konno, K., Feldmann, F.M., and McDermott, W. (1967). Pyrazinamide suscep-
tibility and amidase activity of tubercle bacilli. Am. Rev. Respir. Dis. 95,
Koul, A., Arnoult, E., Lounis, N., Guillemont, J., and Andries, K. (2011). The
challenge of new drug discovery for tuberculosis. Nature 469, 483–490.
Langdon, S.R., Mulgrew, J., Paolini, G.V., and van Hoorn, W.P. (2010).
Predicting cytotoxicity from heterogeneous data sources with Bayesian
learning. J. Cheminform. 2, 11.
Ledford, H. (2011). Translational research: 4 ways to fix the clinical trial. Nature
Lee, R.E., Protopopova, M., Crooks, E., Slayden, R.A., Terrot, M., and Barry,
C.E., III. (2003). Combinatorial lead optimization of [1,2]-diamines based on
ethambutol as potential antituberculosis preclinical candidates. J. Comb.
Chem. 5, 172–187.
Lenaerts, A.J., Gruppo, V., Brooks, J.V., and Orme, I.M. (2003). Rapid in vivo
screening of experimental drugs for tuberculosis using g interferon gene-dis-
rupted mice. Antimicrob. Agents Chemother. 47, 783–785.
Lienhardt, C., Glaziou, P., Uplekar, M., Lo ¨nnroth, K., Getahun, H., and
Raviglione, M. (2012a). Global tuberculosis control: lessons learnt and future
prospects. Nat. Rev. Microbiol. 10, 407–416.
Lienhardt, C., Raviglione, M., Spigelman, M., Hafner, R., Jaramillo, E.,
Hoelscher, M., Zumla, A., and Gheuens, J. (2012b). New drugs for the treat-
ment of tuberculosis: needs, challenges, promise, and prospects for the
future. J. Infect. Dis. 205(Suppl 2), S241–S249.
Macarron, R., Banks, M.N., Bojanic, D., Burns, D.J., Cirovic, D.A., Garyantes,
T., Green, D.V., Hertzberg, R.P., Janzen, W.P., Paslay, J.W., et al. (2011).
Impact of high-throughput screening in biomedical research. Nat. Rev. Drug
Discov. 10, 188–195.
Maddry, J.A., Ananthan, S., Goldman, R.C., Hobrath, J.V., Kwong, C.D.,
Maddox, C., Rasmussen, L., Reynolds, R.C., Secrist, J.A., III, Sosa, M.I.,
et al. (2009). Antituberculosis activity of the Molecular Libraries Screening
Center Network library. Tuberculosis (Edinb.) 89, 354–363.
Magnet, S., Hartkoorn, R.C., Sze ´kely, R., Pato ´, J., Triccas, J.A., Schneider, P.,
Sza ´ntai-Kis, C., Orfi, L., Chambon, M., Banfi, D., et al. (2010). Leads for antitu-
bercular compounds from kinase inhibitor library screens. Tuberculosis
(Edinb.) 90, 354–360.
Mak, P.A., Rao, S.P., Ping Tan, M., Lin, X., Chyba, J., Tay, J., Ng, S.H., Tan,
B.H., Cherian, J., Duraiswamy, J., et al. (2012). A high-throughput screen to
identify inhibitors of ATP homeostasis in non-replicating Mycobacterium
tuberculosis. ACS Chem. Biol. 7, 1190–1197.
Murugasu-Oei, B., and Dick, T. (2000). Bactericidal activity of nitrofurans
against growing and dormant Mycobacterium bovis BCG. J. Antimicrob.
Chemother. 46, 917–919.
Chemistry & Biology
Bayesian Models for Mtb Drug Discovery
Chemistry & Biology 20, 370–378, March 21, 2013 ª2013 Elsevier Ltd All rights reserved 377
Nathan, C. (2011). Making space for anti-infective drug discovery. Cell Host Download full-text
Microbe 9, 343–348.
Nuermberger, E.L., Spigelman, M.K., and Yew, W.W. (2010). Current develop-
ment and future prospects in chemotherapy of tuberculosis. Respirology 15,
O’ Connor, C.J., Beckmann, H.S., and Spring, D.R. (2012). Diversity-oriented
synthesis: producing chemical tools for dissecting biology. Chem. Soc. Rev.
Oprea, T.I., Davis, A.M., Teague, S.J., and Leeson, P.D. (2001). Is there a
difference between leads and drugs? A historical perspective. J. Chem. Inf.
Comput. Sci. 41, 1308–1315.
bad bugs: confronting the challenges of antibacterial discovery. Nat. Rev.
Drug Discov. 6, 29–40.
Periwal, V., Rajappan, J.K., Open Source Drug Discovery Consortium, Jaleel,
A.U., and Scaria, V. (2011). Predictive models for anti-tubercular molecules
using machine learning on high-throughput biological screening datasets.
BMC Res. Notes 4, 504.
Prathipati, P., Ma, N.L., and Keller, T.H. (2008). Global Bayesian models for the
prioritization of antitubercular agents. J. Chem. Inf. Model. 48, 2362–2370.
Reynolds, R.C., Ananthan, S., Faaleolea, E., Hobrath, J.V., Kwong, C.D.,
Maddox, C., Rasmussen, L., Sosa, M.I., Thammasuvimol, E., White, E.L.,
et al. (2012). High throughput screening of a library based on kinase inhibitor
scaffolds against Mycobacterium tuberculosis H37Rv. Tuberculosis (Edinb.)
Rogers, D., Brown, R.D., and Hahn, M. (2005). Using extended-connectivity
fingerprints with Laplacian-modified Bayesian analysis in high-throughput
screening follow-up. J. Biomol. Screen. 10, 682–686.
Rozwarski, D.A., Grant, G.A., Barton, D.H., Jacobs, W.R., Jr., and Sacchettini,
J.C. (1998). Modification of the NADH of the isoniazid target (InhA) from
Mycobacterium tuberculosis. Science 279, 98–102.
Sacchettini, J.C., Rubin, E.J., and Freundlich, J.S. (2008). Drugs versus bugs:
in pursuit of the persistent predator Mycobacterium tuberculosis. Nat. Rev.
Microbiol. 6, 41–52.
Sandegren, L., Lindqvist, A., Kahlmeter, G., and Andersson, D.I. (2008).
Nitrofurantoin resistance mechanism and fitness cost in Escherichia coli.
J. Antimicrob. Chemother. 62, 495–503.
Sarker, M., Talcott, C., Madrid, P., Chopra, S., Bunin, B.A., Lamichhane, G.,
Freundlich, J.S., and Ekins, S. (2012). Combining cheminformatics methods
and pathway analysis to identify molecules with whole-cell activity against
Mycobacterium tuberculosis. Pharm. Res. 29, 2115–2127.
Saunders, J. (1992). Non-nucleoside inhibitors of HIV reverse transcriptase:
screening successes—clinical failures. Drug Des. Discov. 8, 255–263.
Schneider, G. (2010). Virtual screening: an endless staircase? Nat. Rev. Drug
Discov. 9, 273–276.
Schuster, D., Laggner, C., and Langer, T. (2005). Why drugs fail—a study on
side effects in new chemical entities. Curr. Pharm. Des. 11, 3545–3559.
Scorpio, A., and Zhang, Y. (1996). Mutations in pncA, a gene encoding pyrazi-
namidase/nicotinamidase, cause resistance to the antituberculous drug pyra-
zinamide in tubercle bacillus. Nat. Med. 2, 662–667.
Singh, R., Manjunatha, U., Boshoff, H.I., Ha, Y.H., Niyomrattanakit, P.,
Ledwidge, R., Dowd, C.S., Lee, I.Y., Kim, P., Zhang, L., et al. (2008). PA-824
kills nonreplicating Mycobacterium tuberculosis by intracellular NO release.
Science 322, 1392–1395.
Stanley, S.A., Grant, S.S., Kawate, T., Iwase, N., Shimizu, M., Wivagg, C.,
Silvis, M., Kazyanskaya, E., Aquadro, J., Golas, A., et al. (2012).
Identification of novel inhibitors of M. tuberculosis growth using whole cell
based high-throughput screening. ACS Chem. Biol. 7, 1377–1384.
Stover, C.K., Warrener, P., VanDevanter, D.R., Sherman, D.R., Arain, T.M.,
Langhorne, M.H., Anderson, S.W., Towell, J.A., Yuan, Y., McMurray, D.N.,
et al. (2000). A small-molecule nitroimidazopyran drug candidate for the
treatment of tuberculosis. Nature 405, 962–966.
Vilche `ze, C., Baughn, A.D., Tufariello, J., Leung, L.W., Kuo, M., Basler, C.F.,
Alland, D., Sacchettini, J.C., Freundlich, J.S., and Jacobs, W.R., Jr. (2011).
Novel inhibitors of InhA efficiently kill Mycobacterium tuberculosis under
aerobic and anaerobic conditions. Antimicrob. Agents Chemother. 55, 3889–
Volarath, P., Harrison, R.W., and Weber, I.T. (2007). Structure based drug
design for HIV protease: from molecular modeling to cheminformatics. Curr.
Top. Med. Chem. 7, 1030–1038.
Walters, W.P., and Murcko, M.A. (2002). Prediction of ‘drug-likeness.’. Adv.
Drug Deliv. Rev. 54, 255–271.
Whiteway, J., Koziarz, P., Veall, J., Sandhu, N., Kumar, P., Hoecher, B., and
Lambert, I.B. (1998). Oxygen-insensitive nitroreductases: analysis of the roles
of nfsA and nfsB in development of resistance to 5-nitrofuran derivatives in
Escherichia coli. J. Bacteriol. 180, 5529–5539.
Xie, Z., Siddiqi, N., and Rubin, E.J. (2005). Differential antibiotic suscepti-
bilities of starved Mycobacterium tuberculosis isolates. Antimicrob. Agents
Chemother. 49, 4778–4780.
Zientek, M., Stoner, C., Ayscue, R., Klug-McLeod, J., Jiang, Y., West, M.,
Collins, C., and Ekins, S. (2010). Integrated in silico-in vitro strategy for
addressing cytochrome P450 3A4 time-dependent inhibition. Chem. Res.
Toxicol. 23, 664–676.
Chemistry & Biology
Bayesian Models for Mtb Drug Discovery
378 Chemistry & Biology 20, 370–378, March 21, 2013 ª2013 Elsevier Ltd All rights reserved