ArticlePDF Available

BioSuite: A comprehensive bioinformatics software package (A Unique Industry-Academia Collaboration)

Authors:

Abstract and Figures

BioSuite: A comprehensive bioinformatics software package (A unique industry–academia collaboration) The NMITLI-BioSuite Team* Keywords: Bioinformatics, BioSuite, industry–academia collaboration, software. THE last decade has witnessed an exponential growth of information in the field of biological macromolecules such as proteins and nucleic acids and their interactions with other molecules. Computational analysis and predictions based on such information are increasingly becoming an essential and integral part of modern biology. With rapid advances in the area, there is a growing need to develop versatile bioinformatics software packages, which are effi-cient and incorporate the latest developments in this field. In view of this, the Council of Scientific and Industrial Research, India, undertook an initiative to promote a unique industry–academia collaboration, to develop a compre-hensive bioinformatics software package, under its New Millennium Initiative for Technology Leadership in India programme. BioSuite, a product of that effort, has been developed by Tata Consultancy Services who took the primary coding responsibility with significant backing from a large academic community who participated on advisory roles through the project period. BioSuite integrates the functions of macromolecular sequence and structural analysis, chemoinformatics and algorithms for aiding drug discovery. The suite organized into four major modules, contains 79 different programs, making it one of the few comprehensive suites that caters to a major part of the spectrum of bioinformatics applica-tions. The four major modules, (a) Genome and proteome sequence analysis, (b) 3D modelling and structural analysis, (c) Molecular dynamics simulations and (d) Drug design, are made available through a convenient graphics-user in-terface along with adequate documentation and tutorials. The unique partnership with academia has also ensured that the best available methodology has been adopted for each of the 79 programs, which has been thoroughly evaluated in several stages, leading to high scientific value of the suite. The software, apart from having the advantage of running on a Linux platform on a personal computer, is also flexible, modular, and allows for newer algorithms to be plugged into the overall framework. The package will be valuable for high quality academic research, industrial research and development and for teaching purposes, both locally within the country as well as in the international arena.
Content may be subject to copyright.
GENERAL ARTICLES
CURRENT SCIENCE, VOL. 92, NO. 1, 10 JANUARY 2007 29
The team consists of Tata Consultancy Services: M. Vidyasagar, S.
Mande, S. Rajgopal, B. Gopalkrishnan, S. T. P. T. Srinivas, C. Uma
Maheswara Rao, T. Kathiravan, K. Mastanarao, S. Narendranath, S.
Rohini, A. Irshad, T. Murali, C. Subrahmanyam, T. Mona, S. Sankha,
V. Priya, D. Suman, V. V. Raja Rao, P. Nageswara Rao, R. Issaac, H.
Yashodeep, B. Arundhoti, G. Nishant, S Jignesh, K. S. Chaitanya, S. P.
V. Prasad Reddy; Bose Institute: P. Chakraborty; Centre for DNA Fin-
gerprinting and Diagnosis: S. E. Hasnain, S. Mande, A. Nagarajaram,
A. Ranjan, M. S. Acharya, M. Anwaruddin, S. K. Arun, Gyanrajkumar,
D. Kumar, S. Priya, S. Ranjan, B. R. Reddi, J. Seshadri, P. Sravan
Kumar, S. Swaminathan, P. Umadevi, V. Vindal, S. Vijaykrishnan;
Central Drug Research Institute: A. K. Saxena, A. Dixit, P. Prathipati,
S. K. Kashaw; Indian Institute of Chemical Biology: C. Mandal, S.
Bag; Indian Institute of Science: N. Balakrishnan, M. Bansal, N. R.
Chandra*, M. R. N. Murthy, S. Ramakumar, K. Sekar, N. Srinivasan,
K. Suguna, S. Vishveshwara*, R. Anandhi, Bhadra, S. Das, P. Hansia,
S. Hariharaputran, J. Jeyakani, R. Karthikeyan, R. K. Pandey, C. S.
Swamy, B. Vasanthakumar; Indian Institute of Technology Bombay:
P. V. Balaji, R. Y. Patel; Indian Institute of Technology Delhi:
B. Jayaram, S. A. Shaikh; Indian Institute of Technology Kharagpur:
P. P. Chakrabarti, A. Banerjee, A. Chakrabarti; Indian Statistical Insti-
tute: R. L. Karandikar, Delhi and P. Chaudhuri, Kolkata; Institute o
f
M
icrobial Technology: G. P. S. Raghava, A. Ghosh; Institute of Bioin-
f
ormatics and Applied Biotechnology: M. Bansal, N. Paramsivam;
I
nstitute of Genomics and Integrative Biology: S. K. Brahmachari, D.
Dash, C. Balasubramaniam, A. Basu, P. Biswas, M. Hariharan, R.
Mathur, K. S. Sandhu, V. Scaria, R. Shankar; International Institute o
f
I
nformation Technology: P. J. Narayanan, V. Jain, Nirnimesh; Madurai
Kamaraj University: S. Krishnaswamy, V. Alaguraj, R. Marikkannu,
A. V. S. K. Mohan Katta, N. Krishnan, K. V. Srividhya, P. J. Eswari;
N
ational Institute of Pharmaceutical Education and Research: P. V.
Bharatam, P. Iqbal; Saha Institute of Nuclear Physics: D. Bhatta-
charyya; University of Hyderabad: G. R. Desiraju, J. J. Kumar,
M. Ravikumar; University of Madras: M. Gautham, P. A. Prasad and
D. Bharanidharan. *For correspondence. (nchandra@physics.iisc.ernet.in;
sv@mbu.iisc.ernet.in)
BioSuite: A comprehensive bioinformatics
software package (A unique industry–academia
collaboration)
The NMITLI-BioSuite Team*
Keywords: Bioinformatics, BioSuite, industry–academia collaboration, software.
THE last decade has witnessed an exponential growth of
information in the field of biological macromolecules
such as proteins and nucleic acids and their interactions
with other molecules. Computational analysis and predictions
based on such information are increasingly becoming an
essential and integral part of modern biology. With rapid
advances in the area, there is a growing need to develop
versatile bioinformatics software packages, which are effi-
cient and incorporate the latest developments in this field.
In view of this, the Council of Scientific and Industrial
Research, India, undertook an initiative to promote a unique
industry–academia collaboration, to develop a compre-
hensive bioinformatics software package, under its New
Millennium Initiative for Technology Leadership in India
programme. BioSuite, a product of that effort, has been
developed by Tata Consultancy Services who took the
primary coding responsibility with significant backing
from a large academic community who participated on
advisory roles through the project period.
BioSuite integrates the functions of macromolecular
sequence and structural analysis, chemoinformatics and
algorithms for aiding drug discovery. The suite organized
into four major modules, contains 79 different programs,
making it one of the few comprehensive suites that caters
to a major part of the spectrum of bioinformatics applica-
tions. The four major modules, (a) Genome and proteome
sequence analysis, (b) 3D modelling and structural analysis,
(c) Molecular dynamics simulations and (d) Drug design,
are made available through a convenient graphics-user in-
terface along with adequate documentation and tutorials.
The unique partnership with academia has also ensured
that the best available methodology has been adopted for each
of the 79 programs, which has been thoroughly evaluated
in several stages, leading to high scientific value of the
suite. The software, apart from having the advantage of
running on a Linux platform on a personal computer, is
also flexible, modular, and allows for newer algorithms to
be plugged into the overall framework. The package will
be valuable for high quality academic research, industrial
research and development and for teaching purposes, both
locally within the country as well as in the international
arena. A full list of the programs as well as their example
usage can be found at http://www.atc.tcs.co.in/bioinfo/
publications/biosuite_paper.pdf.
Background
Genesis of BioSuite
The Council of Scientific and Industrial Research (CSIR),
Government of India, proposed a new millennium initia-
tive for technology leadership in India (NMITLI), in 2000,
wherein India could acquire leadership positions in key
technology areas (NMITLI). Development of versatile,
GENERAL ARTICLES
CURRENT SCIENCE, VOL. 92, NO. 1, 10 JANUARY 2007 30
Table 1. Roles played by different groups for ensuring successful development of BioSuite
Algorithm design, Code writing, Coding quality checks, Graphic-user Tata Consultancy Services, team led by M. Vidyasagar
interfaces and performance benchmarking Sharmila Mande and Rajagopal Srinivasan
Algorithm/module design suggestions and scientific evaluations Academic partners
Project monitoring committee R. Narasimha, G. Padmanaban, G. R. Desiraju, D. Balasubramanian
Project co-ordination Yogeswara Rao and Vibha Sawhney, CSIR
Project funding CSIR, NMITLI Scheme, Govt of India
Manuscript preparation Coordinated by Nagasuma Chandra and Saraswathi Vishveshwara, IISc
portable bioinformatics software was recognized as one
such area, taking into account the expertise available in
the Indian academic community. Such a project, promoted
by CSIR, was therefore flagged off in partnership with
the industry, where Tata Consultancy Services (TCS) took
the major responsibility of developing the BioSuite soft-
ware with significant scientific support from the major
academic institutions in the country (Table 1). The objec-
tives of the project have been to develop indigenously, a
set of software tools, that would assist the academic re-
search, R&D and applications in industry, in the rapidly
emerging field of bioinformatics and rational drug design.
The need for such a software suite is exemplified by
two main factors: (a) increase in bioinformatics activities
at all levels – education, research, industry, rapid growth
of primary data and methods in computational biology
and (b) limitations of existing suites – such as very high
cost and not being comprehensive under a single frame-
work, as discussed later. A team of 35 members from TCS
worked on this project.
Mode of operation
To ensure the smooth functioning of the project, the fol-
lowing management structure was put in place: (a) A
Monitoring Committee, monitored the progress of the
project through periodic meetings with TCS and the aca-
demic partners providing timely focus, (b) A Steering
Committee, consisting of scientists from academic institu-
tions and TCS, coordinated the activities of the group, (c)
Domain experts and consultants, consisting of all acade-
mic partners, helped in arriving at a basic structure for the
suite. Given the large size of the group and the involve-
ment of 18 institutions, the efforts from CSIR and the
monitoring committees have played a significant role in
fostering the unique partnership to ensure success of this
project. The domain experts have advised TCS on the in-
dividual modules and individual programs required in
each module, identified appropriate algorithms at each
step, as also the features required for each program, as
per the current research trends and requirements. Further,
(d) a team of pseudo-code developers of six people at
TCS, have interacted with domain experts and directed
their (e) in-house team of code developers, consisting of
27 software engineers, who have written the actual code.
The (f) Software Project Management Committee from
TCS has ensured the overall activities at that end and en-
sured appropriate benchmarking and in-house quality
checks from the software perspective. The scientific per-
formance of the codes developed has been further evalu-
ated by the academic partners, who have tested and repor-
ted bugs to Project Management Committee, after which
the codes have been improved/modified where required.
Further, an autonomous assessment of the suite has been
obtained by an independent expert in the area.
Operational schedules
A glimpse of the schedules and the various milestones
reached are given below: (a) Identification of the modules,
the required programs in each module and the appropriate
algorithm(s) for each program, was completed in the first
four months, following which a (b) Software Requirement
Specification (SRS) document was developed and revie-
wed in the next two months. Next, the pseudo-codes were
developed in about five months and converted into final
code in the next 12 months. In parallel with alpha-testing
that was carried out simultaneously with code develop-
ment, the documentation and creation of a user guide took
about seven months. Bug reporting and bug fixes were car-
ried out in iterations through the testing phases and a
beta-version was produced by June 2004, taking a total of
24 months. Evaluation and bug fixing of this version was
carried out in five months, leading to the first full ver-
sion, soft-launched in July 2004 and product released in
December 2004.
Overview of the organization of the suite
The entire package, consisting of 79 different programs is
organized into four major modules, all linked through
three common graphics-user interface (GUI) workbenches,
as illustrated in Figure 1. The four modules are: (a) Genome
and sequence analysis, (b) 3D modelling and structure
analysis, (c) Molecular dynamics simulations and (d) Drug
design. They are accessible through central GUIs for file
handling, sequence and structure windows.
GENERAL ARTICLES
CURRENT SCIENCE, VOL. 92, NO. 1, 10 JANUARY 2007 31
Figure 1. Modular organization of BioSuite.
Table 2. Examples of programs contained in the modules
Sequence and genome analysis
Genome sequence assembly and EST mapping1, ePCR2, ORF prediction3, Intron–exon boundary4, Database search5 and sequence align-
ments (pairwise6,7; multiple8; whole genome alignment9); Motifs and patterns (restriction sites10, motif building and searching11; primer and
probe design12); RNA and protein secondary structure and transmembrane prediction13–15; Domain building and searching16, gene order17,
unique genes18; Phylogenetic analysis, tree construction, evolutionary distance estimation and profiling19–21.
Structural analysis
Nucleic acid analysis22, protein structure quality check23, symmetry-related molecules, structural superposition24, interactions25, homology
modelling and threading26; Fold classification27; Molecular surface area, solvent accessible surface area and volume28; Binding site detec-
tion (PASS29; ET30).
Simulations
Energy minimizations (steepest descent31 and conjugate gradient minimizers32; forcefields33); Electrostatic potential maps34,35; Molecular
dynamics36,37; MD analysis of various trajectories, RMSD, average position and plots of system properties.
Drug design
Structure-based design using protein–ligand docking38; Conformation search39; Steric and electrostatic ligand alignment40; QSAR with over
80 descriptors and regression analysis; Pharmacophore identification and pharmacophore-based search41,42.
Table 2 lists the important programs in each module. A
full list of the modules as well as example outputs of the
individual programs can be found at http://www.atc.
tcs.co.in/bioinfo/publications/biosuite_paper.pdf. Combi-
nation of the four modules makes BioSuite a comprehen-
sive package, covering much of the activities of the
bioinformatics spectrum, starting from genome sequences
to individual and multiple protein sequences, different
levels of structure prediction, analysis of the structures,
molecular mechanics calculations, molecular dynamics
simulations, chemoinformatics and finally integration
with the application of the sequence and structural analy-
ses in rational drug design through algorithms for QSAR,
pharmacophore identification and docking processes, for
facilitating rational drug design.
Choice of algorithms and coding methods
Choice of algorithms was discussed extensively with aca-
demic partners and the latest concepts available in the lit-
erature have been adopted wherever possible. For some
programs, more than one algorithm has also been imple-
mented, to suit the current research trends of using multiple
methods and studying consensus predictions. In general,
about two scientists have analysed and chosen a particular
algorithm for a particular purpose. Table 2 indicates the
algorithms chosen for each of the programs. The knowl-
edge and description of each of the algorithms have been
captured into detailed SRS documents by the pseudo-code
development team at TCS through extensive interactions
with the academic partners as well as with a detailed study
GENERAL ARTICLES
CURRENT SCIENCE, VOL. 92, NO. 1, 10 JANUARY 2007 32
of the appropriate literature. The pseudo-code generated
for each algorithm and its linkages have been developed
using formal software engineering methods, so as to
guarantee correctness. The pseudo-code was then conver-
ted into actual code by another set of programers who
have ensured strict adherence to well-established quality
processes such as CMMi Level 5.
All codes have been written in C++. A total of 170 algo-
rithms and about 100 QSAR descriptor calculators have
been implemented in 79 programs, with about 700,000
lines of code. The suite is modular, which not only facili-
tates seamless updation of the modules but also enables
integration of new programs by the end users.
Description of the modules
The functionalities of the programs contained within the
four major modules are briefly described below.
Genome and proteome sequence analysis
This module deals with the applications relating to the
analysis of the nucleic acid and protein sequences, not only
of individual molecules, but also of complete genome and
proteome sequences. This module would enable researchers
to annotate genomes, predict protein secondary structures,
derive a phylogenetic relationship among organisms and
compare two genomes for similarities at the gene or protein
level, along with a range of other applications. This module
is further divided into four sub-modules: Sequence analysis,
genome analysis, Comparative genomics and Utilities.
Sequence analysis of individual molecules is enabled
through the sequence analysis modules, while the pro-
grams in the ‘Genome analysis’ sub-module enable com-
parison and analysis of full genomes and proteomes. Two
database searching tools, BLAST and PSI-BLAST are in-
terfaced with the suite, that will enable searching data-
bases to identify a given sequence or find conserved
domains or even find distantly related homologues from
some other species. An option of building custom-made
databases is also provided. Alignment of sequences, a
crucial task in sequence analysis, is provided for, through
two well-established algorithms for global and local
alignments using dynamic programing algorithms (Nee-
dleman–Wunsch and Smith–Waterman). Further, a hier-
archical clustering-based multiple alignment algorithm
(ClustalW) is included for aligning a set of sequences.
Besides, pattern identification and matching tasks such as
finding composition, inverted repeats, DNA structure motifs,
restriction site analysis and repeat analysis, are part of
this module.
Algorithms for secondary structure prediction including
transmembrane region detection, RNA structure predic-
tion and analysis are also part of this module. The secon-
dary structure prediction algorithms were trained (or re-
trained as appropriate) using a comprehensive dataset
containing 731 high resolution protein structures (with
resolutions 2 Å) that comprise a non-redundant dataset
(redundancy has been removed through sequence com-
parisons, using a similarity cut-off of 25% with the Blo-
sum62 substitution matrix). Use of a large dataset in
training the prediction algorithms ensures high prediction
accuracy. A comprehensive biophysical parameter com-
putation ability has also been built into BioSuite, by ex-
tracting 36 different physico-chemical properties for
protein molecules from the dataset and subsequently using
them as training-sets in the prediction algorithms. Algori-
thms for predicting isoelectric point, peptide cleavage
patterns, B-cell antigenicity from protein sequences are
also included in this module. Yet another useful feature
of this module is the domain building and analysing func-
tionality. Programs are available for identifying domains,
building consensus domain sequences, calibrating them
and searching across a database. Hidden Markov models
using sequence profiles are used for these purposes. In
addition, the module has programs for studying molecular
evolution, to cluster groups of sequences based on several
criteria and to compute phylogenetic trees as well as to
calculate evolutionary distances. Finally, algorithms for
gene finding, gene assembly, probe and primer design,
vector trimming and EST analysis are also part of this
module. Two examples of using the programs of this
module are illustrated in Figure 2 a and b.
3D Modelling and analysis
The 3D modelling and analysis module has capabilities to
build, analyse and predict three-dimensional structures of
macromolecules and macromolecular complexes. This
module is further subdivided into the following sub-
modules: (a) Homology modelling, (b) Threading, (c)
Building proteins, (d) Building nucleic acids, (e) Building
carbohydrates, (f) Generation of symmetry-related mole-
cules, (g) Structural superposition, (h) Surfaces and volumes,
(i) Binding site analysis, (j) Nucleic acid analysis, (k)
Interactions, (l) Quality check, and (m) Fold detection.
Example snapshots are shown in Figure 2 c and d.
Building the models of protein molecules by predicting
their three-dimensional structures by comparative modelling
techniques are enabled through the first two sub-modules,
for which six algorithms are available that incorporate the
latest concepts in these areas. Building nucleic acids and
carbohydrates using geometric information is enabled
through the building modules. A notable feature of the
builder programs is the incorporation of 17 geometrical
templates for nucleic acids and 12 templates for carbohy-
drates providing a handle to address the stereo-chemical
variability in a large number of sugars. Several programs
that can address visualization and analysis of crystallo-
graphically derived structures are also included in this
GENERAL ARTICLES
CURRENT SCIENCE, VOL. 92, NO. 1, 10 JANUARY 2007 33
module. For example, a lattice assembly of a protein
molecule, as seen in its crystal structure can be generated
easily. Structure validation tools for proteins and nucleic
acids are enabled through the quality check programs.
Extensive analysis is possible through the analysis and
interactions functions, that can be used for analysing in-
tegral features of protein structure, protein–protein inter-
actions as well as protein–ligand interactions. Finally,
algorithms for classifying protein structures, in relation to
the other protein structures known in the literature, are
also included in this module through the fold detection
routines. Here too, the unique integration of building,
Figure 2. (Contd…)
GENERAL ARTICLES
CURRENT SCIENCE, VOL. 92, NO. 1, 10 JANUARY 2007 34
Figure 2 a–h. Example snapshots from various modules of BioSuite: a, Genome comparison: Mapping Protein gi|42525869, from Bacillus halo-
rudians to Clusters of Orthologous Groups (COG no. 1893), by using orthologues. A homologue of a lipase from Treponema denticola gi|42525869
was identified from Bacillus halorudians; b, Protein secondary structure prediction using different methods and property profiles derived for the
lipase protein sequence; c, Different molecular representations in BioSuite – (a) ball-and-stick, (b) cartoons, (c) molecular surface, (d) van der
Waals surface, (e) space fill, (f) C-alpha trace, (g) sticks, (h) ribbons, (i) solvent accessible surface; d, Protein structure quality check using a
Ramachandran plot; e, An example of MD-analysis, variation in kinetic energy, potential energy, total energy, temperature during simulation; f, An
example of pharmacophore fitting, g, Alignments produced by BioSuite derived pharmacophore model, and h, An example of a field fit alignment:
Molecular similarity between a pair of molecules is calculated by using the Gaussian function in BioSuite.
analysis and structural bioinformatics tools such as structure
classification, all within one framework, significantly en-
hances the technical value of BioSuite.
Simulations
The ‘simulations’ module essentially simulates the be-
haviour of a molecule, in terms of its three-dimensional
structure. The different submodules covered are, Force-
field, Energy minimization, Molecular dynamics, Monte
Carlo simulations and Electrostatics. The molecular simu-
lation of a system can conceptually be broken into three
components: (a) generating a computational description
of a biological/chemical system typically in terms of atoms,
molecules and associated force field parameters, (b) the
numerical solution of the equations which govern their
evolution, and (c) the application of statistical mechanics
to relate the behaviour of a few individual atoms/mole-
cules to the collective behaviour of the very many.
BioSuite is compatible both with the AMBER and the
CHARMM force fields for macromolecules (proteins, nu-
cleic acids and carbohydrates) and uses GAFF for small
molecules (for e.g. natural substrates, drugs and drug-like
substances). For each of the force fields, both treatments
of the type of dielectric: either constant or distant depen-
dent, are provided.
Several algorithms for first-order unconstrained energy
minimization are contained in this module, providing a
wide range of line search options. Thus, the coordinates
of the molecular system can be adjusted so as to lower its
energy, relative to the starting conformation, by using one
of the following minimizers: Steepest descent algorithm,
Conjugate gradient methods, Fletcher–Reeves algorithm,
Polak–Ribiere algorithm, Polak–Ribiere plus algorithm
and Shanno’s algorithm.
Further, to carry out molecular dynamics (MD) simula-
tions, BioSuite provides NVE (micro-canonical), NVT
(canonical), and NPT (isobaric–isothermal) ensembles for
MD simulations with the choice of using velocity–verlet
or leapfrog integrator. BioSuite also provides options for
using SHAKE and RATTLE constraints.
MD being a deterministic approach, where the state of
the system at any future time can be predicted from its
current state, the tools provided in the suite can be used
for solving Newton’s equations of motion for a given ini-
tial conformation, to study how the system evolves over
GENERAL ARTICLES
CURRENT SCIENCE, VOL. 92, NO. 1, 10 JANUARY 2007 35
time. Several intuitive and user-friendly tools are provided
to analyse the resulting trajectories or time series of con-
formations. For example (Figure 2 e), plots at various en-
ergy levels along with the temperature, can be obtained.
Plots generated with defined parameters show the structure
and position at various energy levels, both of them pre-
sent in two adjacent panels that can help to view the posi-
tion of the molecule at a given temperature. The Monte
Carlo method that generates configurations randomly and
uses a special set of criteria to decide whether or not to
accept each new configuration, is also part of this module.
In the electrostatics sub-module, BioSuite provides a
solution for the linear Poisson–Boltzmann equation, to
enable modelling of contributions of solvent, counterions
and protein charges to electrostatic fields in molecules.
Four choices for boundary conditions namely, zero, partial
coulombic, full coulombic and focusing, are provided.
For charge distribution, there are two options: trilinear
and uniform. BioSuite has a very fast SOR solver, which
utilizes spectral radius calculations to speed up convergence.
Drug design
This module provides the following functionalities: (a)
Prediction of biological activities of unknown chemical
entities using QSAR, (b) Identification of pharmacopho-
res in biologically active molecules, (c) Superimposition
of a set of molecules in 3D space by alignment, (d) Iden-
tification of the ligand poses in 3D space when it binds to
a target using docking. Using the functionalities provided
in the drug design module, one can identify lead-like
molecules from a set of molecules, redesign them and
predict their activities. Thus, lead optimization can be
achieved iteratively. If the target structure is known, then
the lead optimization can be done using the structure-based
method, such as by docking.
The process of aligning a set of molecules in three-
dimensional space, to find the superimposable regions of
a group of molecules or to estimate molecular similarity
can be performed by using either the ‘Field Fitting’ or the
‘RMS Fitting’ approach. The field fitting is done by
aligning molecules using their electrostatic potentials and
steric shapes, starting from their atomic coordinates and
charges computed from Gaussian functions, while the
‘RMS fitting’ is done by minimizing the distances bet-
ween specified atoms in the molecules. Flexible superpo-
sition can also be achieved by allowing rotations about
single bonds.
For deriving and matching ‘3D-pharmacophores’, the
following features are extracted/used: (a) Hydrogen bond
donor, (b) Hydrogen bond acceptor, (c) Aliphatic hydro-
phobic group, (d) Aromatic ring, (e) Negatively charged
group, and (f) Positively charged group. Pharmacophores
are identified by using configurations of features common
to a set of molecules. The pharmacophoric configurations
are identified by a pruned exhaustive search, starting with
small sets of features and extending them until no larger
common configuration exists.
To carry out QSAR, where consistent relationships bet-
ween the variations in the values of molecular properties
and the biological activity for a series of compounds are
sought, so that these ‘rules’ can be used to evaluate new
chemical entities, a series of widely accepted feature ex-
traction and statistical tools are provided within BioSuite.
For example, a 2D-QSAR calculation uses either one or
combinations of (a) Electronic, (b) Spatial, (c) Structural,
(d) Thermodynamic and (e) Topological descriptors.
BioSuite has the ability to compute 89 different descriptors.
a few representative descriptors from different classes,
e.g. Polarizability, HOMO and LUMO (electronic), Hf
and Log P from (thermodynamic), log P, MR (thermody-
namic), etc. and were compared with those computed from
standard softwaers, using a dataset of 33 isooxazoles as
potential thrombin receptor antagonists and in general, a
high correlation (>0.9) was observed for the descriptor
values.
Creating and refining a training set required for QSAR
predictions are aided by (a) K-means, (b) K-nearest
neighbours or (c) UPGMA hierarchical clustering algo-
rithms. Tools are also provided for building user-defined
data sets/training sets as well as for searching chemical
databases. The QSAR model can be generated using re-
gression techniques such as Multiple Linear Regression
or Partial Least Squares. If the linearly independent de-
scriptors for the molecules have to be eliminated while
generating the model, then a dimensionality reduction can
be performed by using either (a) Principal component
analysis or (b) Discriminant analysis. Validation of the
model to check the accuracy of the generated model can
be performed by the K-fold cross validation technique
The structure-based drug design sub-module contains
algorithms and utilities required for carrying out molecular
docking. Using either simulated annealing or genetic al-
gorithms (GA) based technique, the ligand conformations
are searched and docked into the binding site of the macro-
molecule. In a simulated annealing-based method, the
ligand’s current position, orientation and conformation
are changed during each cycle, to reach the most energe-
tically favourable conformation of the ligand bound to the
target macromolecule. Thus these algorithms predict both
the lowest energy conformation of the bound ligand as
well as the best position and orientation for its binding to
the target molecule, within the realm of the scientific ca-
pabilities of the approach.
A second popular algorithm is provided for this, the
one based on genetic algorithms. The conformations of the
ligand are encoded as a chromosome. The crossover and
mutation operators are used to bring about random
changes in the conformations of the ligand. A fitness function
is defined for calculating the energy of the conformations
generated. Through a number of runs of the GA cycle, a
conformation having minimum energy is obtained.
GENERAL ARTICLES
CURRENT SCIENCE, VOL. 92, NO. 1, 10 JANUARY 2007 36
Conformation search functionality generates the con-
formations for an input molecule, clusters the conforma-
tions and displays energy and torsion angle values of low
energy conformations. This application generates confor-
mations using two different methods, namely random
conformation search and systematic conformation search.
The random conformation search uses the simulated an-
nealing algorithm. An option is provided to the user to se-
lect the rotatable bonds in the molecule. A few sample
results from the drug-design modules are presented in
Figure 2 fh.
Performance evaluation
Evaluation has been an integral part of the entire deve-
lopment process. To start with, the choice of modules and
the choice of algorithms themselves were evaluated, both
at TCS and by the academic partners. The pseudo-codes
and the SRS documents were then verified, followed by
verification of the software codes by the TCS team. The
scientific performance of the algorithms at various stages
(versions 0.3, 0.7, 1.0a and 1.0) was evaluated independ-
ently by the academic partners at their institutions and
any bugs reported or improvements suggested were sub-
sequently considered and implemented into the suite,
where appropriate. The outputs of each program were
compared with those of other established academic codes/
commercial packages, to verify the scientific performance.
They were also compared with the latest implementations
of the chosen algorithms in the public domain, where
available. The performance has been found to be compa-
rable in all cases. While the utilities of many of the indi-
vidual programs have been enhanced while implementing
in BioSuite, the scientific capabilities and limitations of
each of the programs are bounded by those of the corre-
sponding original algorithms cited in Table 2.
An example of the manner in which the scientific per-
formance was evaluated, is cited below. For testing the
drug design module, 42 thymidine monophosphate kinase
inhibitors were taken and minimization performed using
both AMBER and CHARMM force fields with the conju-
gate gradient algorithm method. Conformational searches
were tested with both systematic and randomized search
methods. Alignments were satisfactory and we obtained
low RMSD values for similar molecules, comparable to
those obtained in Cerius. The time for computation was
found to be good and comparable to other competitor
software. The docking procedure is simple and user-
friendly.
Prominent features of the package
For the most part, the existing software packages evolved
out of academia, and were implementations of algorithms
developed at different places and different times by dif-
ferent persons. As such, often there is no single ‘super-
structure’ into which the algorithms fit seamlessly. To
overcome these issues, BioSuite has been written in a
modular fashion, which would permit the easy implemen-
tation of new algorithms as and when they are discovered.
The unique partnership of the industry with academia
harnesses the strengths of both communities, thus leading
to a superior product both scientifically as well as according
to software engineering standards. Some of the unique
features of BioSuite are: (a) It is comprehensive, contains
programs for carrying out sequence, whole genome and
structure analysis, drug design, all under a common
framework. (b) The software runs on simple personal
computers on a Linux platform. (c) Domain identification
and domain searching tools also available. (d) Trans-
membrane beta strand prediction, enhanced capability in
building molecules in terms of the number of secondary
structure templates available. (e) Enhanced capability in
building larger carbohydrate structures, and (f) Code
written fresh with CMMi-5 standards and consistency in
coding methods to incorporate versatility in each program
making up the entire suite, keeping in view of the genome-
scale operations in bioinformatics.
Roadmap for the future
Going forward, several features are planned to be added
to BioSuite to make it an even more useful platform for
scientific research. Some developments in the pipeline
are described below:
ADME
The Absorption, Distribution, Metabolism and Excretion
profile (ADME) of a drug is an important determinant of
its therapeutic efficacy. Accurately modelling the ADME
properties of a candidate drug molecule is a necessary
step to increase the chances that it will eventually become
a successful drug. In the recent past, models have been
developed for estimating various ADME-related proper-
ties such as blood-brain barrier penetration, human intes-
tinal absorption, binding affinity to human serum albumin
and CaCO2 cell permeability. These will be integrated
into the existing QSAR module of BioSuite.
Flexible docking
Docking, in BioSuite 1.0, explores the energetically opti-
mal fit of a flexible small molecule with a rigid protein
molecule. In subsequent releases, an improved version of
the docking algorithm will be implemented that allows
restricted flexibility in the protein molecule as well. This
has been shown to be useful in improving the accuracy in
prediction of the optimal binding conformation.
GENERAL ARTICLES
CURRENT SCIENCE, VOL. 92, NO. 1, 10 JANUARY 2007 37
De novo drug design
An important requirement for drug design is the ability to
generate novel molecules that bind to a known active site.
Implementation of an algorithm is underway for the gene-
ration of novel binding candidates using a strategy of frag-
ment docking followed by elaboration of selected fragments.
tRNA identification
A procedure for identifying tRNA genes in a genome will
be included in the next version of BioSuite. The program
identifies tRNAs based on the recognition of two intra-
genic control regions known as A and B boxes, a highly
conserved part of B box, a transcription termination signal,
and the evaluation of the spacing between these elements.
Improved whole genome comparison
MUMmer is an open source software package for the
rapid alignment of very large DNA and amino acid se-
quences. A newer version of the MUMmer package has
been integrated in BioSuite to find maximal unique
matches between two genomes. The MUMmer output can
also be viewed in the dot-plot format.
Improved graphics
Several techniques are being implemented to enhance the
quality of the 3D graphics display in BioSuite while
speeding up the display.
Scripting interface
While BioSuite provides a number of features and a vast
array of functionality, users might want to implement their
own procedures and programs. For this purpose, a script-
ing interface that exposes the functionality in BioSuite
will be provided so that users can create their own workflows,
develop and test new ideas and automate several tasks.
Sketcher
The next version of Bio-Suite will include a 2D sketcher
for drawing molecules in a manner that chemists are famil-
iar with and to automatically generate 3D structures for
the molecules.
A high-performance version called Bio-Cluster for some
of the memory intensive applications is also planned.
Hardware requirements and documentation
The minimum hardware requirements for BioSuite are as
follows: Intel compatible ×86 Processor, 1.5 GHz, 256 MB
RAM, 3 GB Free Hard Disk Space, Display capable of
1280
× 1024 pixel resolution, High end graphics card
with 3D support for better viewing, Red-Hat Linux 8.0 or
9.0 or Fedora-Core 1/2 operating systems. BioSuite comes
with its own set of documentation. The entire package is
well documented and comes with easy to use tutorials,
which reduce the learning curve and increase efficiency.
Detailed documentation is available at the BioSuite web-
site: http://www.atc.tcs.co.in/BioSuite/.
1. Huang, X., A contig assembly program based on sensitive detec-
tion of fragment overlaps. Genomics, 1992, 14, 18–25.
2. Schuler, G. D., Sequence mapping by electronic PCR. Genome
Res., 1997, 7, 541–550.
3. Delcher, A. L., Harmon, D., Kasif, S., White, O. and Salzberg,
S. L., Improved microbial gene identification with GLIMMER.
Nucleic Acids Res., 1999, 27, 4636–4641.
4. Kleffe, J., Hermann, K., Vahrson, W., Wittig, B. and Brendel, V.,
Logitlinear models for the prediction of splice sites in plant pre-
mRNA sequences. Nucleic Acids Res., 1996, 24, 4709–4717.
5. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman,
D. J., Basic local alignment search tool. J. Mol. Biol., 1990, 215,
403–410.
6. Needleman, S. B. and Wunsch, C. D., A general method applica-
ble to the search for similarities in the amino acid sequence of two
proteins. J. Mol. Biol., 1970, 48, 443–453.
7. Smith, T. F. and Waterman, M. S., Identification of common mole-
cular subsequences. J. Mol. Biol., 1981, 147, 195–197.
8. Thompson, J. D., Higgins, D. G. and Gibson, T. J., CLUSTALW
improving the sensitivity of progressive multiple sequence align-
ment through sequence weighting, positions-specific gap penalties
and weight matrix choice. Nucleic Acids Res., 1994, 22, 4673–4680.
9. Arthur, L. D., Kasif, S., Fleschmann, R. D., Peterson, J., White, O.
and Salzberg, S. L., Alignment of whole genomes. Nucleic Acids
Res., 1999, 27, 2369–2376.
10. Knuth, D. E., Morris, J. H. and Pratt, V. R., Fast pattern matching
in strings. SIAM J. Computing 1977, 6, 323–350.
11. Bailey, T. L. and Elkan, C., Unsupervised learning of multiple motifs
in biopolymers using expectation maximization. Machine Learn-
ing J., 1995, 21, 51–83.
12. SantaLucia, J. Jr., Allawi, H. T. and Seneviratne, P. A., Improved
nearest-neighbor parameters for predicting DNA duplex stability.
Biochemistry, 1996, 35, 3555–3562.
13. Zuker, M., On finding all suboptimal foldings of an RNA mole-
cule. Science, 1989, 244, 48–52.
14. Jones, D. T., Protein secondary structure prediction based on posi-
tion-specific scoring matrices. J. Mol. Biol., 1999, 292, 195–202.
15. Gromiha, M. M., Majumdar, R. and Ponnuswamy, P. K., Identifi-
cation of membrane spanning beta strands in bacterial porins. Pro-
tein Eng., 1997, 10, 497–500.
16. Durbin, R., Eddy, S., Krogh, A. and Mitchison, G., Biological
Sequence Analysis: Probabilistic Models of Proteins and Nucleic
Acids, Cambridge University Press, UK, 1998.
17. Mazumdar, A., Kolaskar, A. and Donald, S., GeneOrder: Compar-
ing the order of genes in small genomes. Bioinformatics, 2001, 17,
162–166.
18. Enright, A. J. and Ouzounis, C. A., GeneRAGE: a robust algorithm
for sequence clustering and domain detection. Bioinformatics,
2000,
16, 451–457.
19. Marcotte, E. M., Pellegrini, M., Thompson, M. J., Yeates, T. and
Eisenberg, D., A combined algorithm for genome-wide prediction
of protein function. Nature, 1999, 402, 83–86.
20. Tamura, K. and Nei, M., Estimation of the number of nucleotide
substitutions in the control region of mitochondrial DNA in humans
and chimpanzees. Mol. Biol. Evolut., 1993, 10, 512–526.
GENERAL ARTICLES
CURRENT SCIENCE, VOL. 92, NO. 1, 10 JANUARY 2007 38
21. Fitch, W. M. and Margoliash, E., Construction of phylogenetic
trees.
Science, 1967, 155, 279–284.
22. Bansal, M., Bhattacharyya, D. and Ravi, B., NUPARM and
NUCGEN: software for analysis and generation of sequence de-
pendent nucleic acid structures. Comput. Appl. Biosci., 1995, 11,
281–287.
23. Laskowski, R. A., MacArthur, M. W., Moss, D. S. and Thornton,
J. M., PROCHECK: a program to check the stereochemical quality
of protein structures. J. Appl. Cryst., 1993, 26, 283–291.
24. Sutcliffe, M. J., Haneef, I., Carney, D. and Blundell, T. L., Knowl-
edge based modeling of homologous proteins, Part I: Three
dimensional frameworks derived from the simultaneous superposi-
tion of multiple structures. Protein Eng., 1987, 1, 377–384.
25. Baker, E. N. and Hubbard, R. E., Hydrogen bonding in globular
proteins. Progr. Biophys. Mol. Biol., 1984, 44, 97–179.
26. Zhang, C. and Kim, S., Environment-dependent residue contact
energies for proteins. Proc. Nat. Acad. Sci., 2000, 97, 2550–2555.
27. Orengo, C. A. and Taylor, W. R., SSAP: Sequential structure
alignment program for protein structure comparison. Methods En-
zymol., 1996, 266, 617–635.
28. Connolly, M. L., Computation of molecular volume. J. Am. Chem.
Soc., 1985, 107, 1118–1124.
29. Brady, G. P. and Stouten, F. W. P., Fast prediction and visualiza-
tion of protein binding pockets with PASS. J. Computer-Aided
Mol. Design, 2000, 14, 383–401.
30. Lichtarge, O., Bourne, H. R. and Cohen, F. E., An evolutionary
trace method defines binding surfaces common to protein families.
J. Mol. Biol., 1996, 257, 342–358.
31. Gilbert, J. C. and Nocedal, J., Global convergence properties of
conjugate gradient methods for optimization. SIAM J. Optimiza-
tion, 1992, 2, 21–42.
32. Watowich, S. J., Meyer, E. S., Hagstrom, R. and Josephs, R., A
stable, rapidly converging conjugate gradient method for energy
minimization. J. Computat. Chem., 1988, 9, 650–661.
33. Weiner, S. J., Kollman, P. A., Case, D. A., Singh, U. C., Ghio, C.,
Alagona, G., Profeta, S. Jr. and Weiner, P. K., A new force field
for molecular mechanical simulation of nucleic acids and proteins.
J. Am. Chem. Soc., 1984, 106, 765–784.
34. Jayaram, B., Sharp, K. A. and Honig, B., The electrostatic poten-
tial of B-DNA. Biopolymers, 1989, 28, 975–993.
35. Nicholls, A. and Honig, B., A rapid finite difference algorithm,
utilizing successive over-relaxation to solve the Poisson–Boltz-
mann equation. J. Computat. Chem., 1991, 12, 435–445.
36. Andersen, H. C., Molecular dynamics simulations at constant pres-
sure and/or temperature. J. Chem. Phys., 1980, 72, 2384–2393.
37. Berendsen, H. J. C., Postma, J. P. M., van Gunsteren, W. F., Di-
Nola, A. and Haak, J. R., Molecular dynamics with coupling to an
external bath. J. Chem. Phys., 1984, 81, 3684–3690.
38. Morris, G. M., Goodsell, D. S., Halliday, R. S., Huey, R., Hart, W.
E., Belew, R. K. and Olson, A. J., Automated docking using a
lamarckian genetic algorithm and empirical binding free energy
function.
J. Computat. Chem., 1998, 19, 1639–1662.
39. Goodman, J. M.,
Chemical Applications of Molecular Modelling,
The Royal Society of Chemistry, London, 1998, pp. 61–69.
40. Good, A. C., Hodgkin, E. E. and Richards, W. G., Utilization of
Gaussian functions for the rapid evaluation of molecular simila-
rity. J. Chem. Inf. Comput. Sci., 1992, 32, 188.
41. Jones, G., Willet, P. and Glen, R. C., A genetic algorithm for
flexible molecular overlay and pharmacophore elucidation. J.
Comput.-Aided Mol. Des., 1995, 9, 532.
42. Kurogi, Y. and Guner, O. F., Pharmacophore modeling and three-
dimensional database searching for drug design using catalyst.
Curr. Med. Chem., 2001, 8, 1035–1055.
Received 17 September 2005; revised accepted 26 October 2006
... The cotton fabric subjected to DSB process was treated with BPs and its β-CD complexes for incorporating the UVR absorbance. The detailed procedure for the pretreatment and finishing treatment of fabric were given in supplementary information [35][36][37][38][39][40][41][42][43][44][45][46]. spectral effectiveness, S(λ); solar spectral irradiance in W nm −1 m. −2 , dλ; measured wavelength interval [nm], T(λ); average spectral transmittance of the cotton fabric] made using the UV-visible double beam spectrophotometer through the diffuse transmission technique using the formula given below, [35] ...
... The preparation of liquid, solid and virtual inclusion complexes and spectral detection were demonstrated in supplementary information [35][36][37][38][39][40][41][42][43][44][45][46] ...
Article
Full-text available
The impact of β-CD on 4 benzophenones (BPs) [namely, BP, HBP, d-HBP, t-HBP] and its resultant effect on the ultraviolet protection factor (UPF) of poplin cotton fabric are tested. Drastic enhancement in the UPF values for BPs:β-CD complexes treated fabrics is noticed (than the untreated/fabric treated with free BPs). The impact of β-CD on the tautomerism of hydroxyl substituted BPs are demonstrated by investigatingthe guest (BPs:absorbers)-host(β-CD:enhancer) process. Orientation of -OH substituted benzene towards 1° rim and positioning of C = O group of BPs in the middle of β-CD could be attributed to the steric effect driven preference of the guest molecules for achieving a rigid fit. Rigid fit rendered by β-CD improves the photostability of BPs and dissipates UVR efficiently through keto-enol tautomerization. With the highest UPF (= 51) d-HBP:β-CD complex is identified as a potential application material for producing sun-protective clothing.
... The 3D modeling and analysis module has capabilities to build, analyze and predict three dimensional structures of macromolecules and macromolecular complexes. The 'Simulations' module essentially simulates the behavior of a molecule, in terms of its three dimensional structure [20,21]. The Drug Design module provides the following functionalities: ...
Article
Full-text available
Drug discovery include drug designing and development, is a multifarious and expensive endeavor, where least number of drugs that pass the clinical trials makes it to market. Software based drug discovery and development methods have major role in the development of bioactive compounds for over last three decades. Novel software based methods such as molecular modeling, structure-based drug design, structure-based virtual screening, ligand interaction and molecular dynamics are considered to be powerful tool for investigation of pharmacokinetic and pharmacodynamic properties of drug, and structural activity relationship between ligand and its target. Computational approaches such as docking confer interaction of small molecules with structural macromolecules and thereby hit identification and lead optimization. These methods are faster, and accurately provide valuable insights of experimental findings and mechanisms of action. In addition, appropriate implementation of these techniques could lead to a reduction in cost of drug designing and development. Currently in biomedicine sciences these software are exhibiting imperative role in the different phases of drug discovery. The review discusses working principle and successful applications of most commonly used software for drug designing and development.
... A total of 442 two-dimensional (2-D) descriptors were calculated using the in-house software Bio-Suite [45]. They included (i) structural descriptors, (ii) physicochemical descriptors, (iii) geometrical descriptors and (iv) topological descriptors. ...
Article
Full-text available
Computational models to predict the developmental toxicity of compounds are built on imbalanced datasets wherein the toxicants outnumber the non-toxicants. Consequently, the results are biased towards the majority class (toxicants). To overcome this problem and to obtain sensitive but also accurate classifiers, we followed an integrated approach wherein (i) Synthetic Minority Over Sampling (SMOTE) is used for re-sampling, (ii) genetic algorithm (GA) is used for variable selection and (iii) support vector machines (SVM) is used for model development. The best model, M3, has (i) sensitivity (SE) = 85.54% and specificity (SP) = 85.62% in leave-one-out validation, (ii) classification accuracy of the training set = 99.67%, (iii) classification accuracy of the test set = 92.59%; and (iv) sensitivity = 92.68, specificity = 92.31 on the test set. Consensus prediction based on models M3-M5 improved these percentages by 5% over M3. From the analysis of results we infer that data imbalance in toxicity studies can be effectively addressed by the application of re-sampling techniques.
Article
One of the most pressing tasks in biotechnology today is to unlock the function of each of the thousands of new genes identified every day. Scientists do this by analyzing and interpreting proteins, which are considered the task force of a gene. This single source reference covers all aspects of proteins, explaining fundamentals, synthesizing the latest literature, and demonstrating the most important bioinformatics tools available today for protein analysis, interpretation and prediction. Students and researchers of biotechnology, bioinformatics, proteomics, protein engineering, biophysics, computational biology, molecular modeling, and drug design will find this a ready reference for staying current and productive in this fast evolving interdisciplinary field. Explains all aspects of proteins including sequence and structure analysis, prediction of protein structures, protein folding, protein stability, and protein interactions Teaches readers how to analyze their own datasets using available online databases, software tools, and web servers, which are listed and updated on the book's web companion page. Presents a cohesive and accessible overview of the field, using illustrations to explain key concepts and detailed exercises for students. © 2010 Elsevier, A Division of Reed Elsevier India, Pvt. Ltd. All rights reserved.
Article
Full-text available
With the advent of significant establishment and development of Internet facilities and computational infrastructure, an overview on bio/chemoinformatics is presented along with its multidisciplinary facts, promises and challenges. The Government of India has paved the way for more profound research in biological field with the use of computational facilities and schemes/projects to collaborate with scientists from different disciplines. Simultaneously, the growth of available biomedical data has provided fresh insight into the nature of redundant and compensatory data. Today, bioinformatics research in India is characterized by a powerful grid computing systems, great variety of biological questions addressed and the close collaborations between scientists and clinicians, with a full spectrum of focuses ranging from database building and methods development to biological discoveries. In fact, this outlook provides a resourceful platform highlighting the funding agencies, institutes and industries working in this direction, which would certainly be of great help to students seeking their career in bioinformatics. Thus, in short, this review highlights the current bio/chemoinformatics trend, educations, status, diverse applicability and demands for further development.
Conference Paper
Bio Suite is a comprehensive Bioinformatics software package developed under a unique academia industry collaboration. The product is hosted on Amazon cloud for easy access to users from anywhere. Bio Suite is an efficient and easy to use state of the art Bioinformatics analysis package that is well suited to be a good teaching tool. Exploiting this feature of Bio Suite, a complete web course has been designed, which describes in detail both basic and advanced concepts in Bioinformatics. This web course is proposed to be hosted on "Knome" - a TCS Knowledge Ecosystem Solution, based on social networking concepts, for learning and sharing. The availability of the web course on a user friendly platform, supported by BioSuite deployed on cloud should make the learning experience easy and enjoyable for students.
Article
Full-text available
Proteins containing amino acid repeats are considered to be of great importance in evolutionary studies. The principal mechanism of formation of amino acid re- peats is by the duplication or recombination of genes. Thus, repeats are found in both nucleotide and protein sequences. In proteins, repeats are involved in protein- protein interactions as well as in binding to other ligands such as DNA and RNA. The study of internal sequence repeats would be helpful to scientists in vari- ous fields, including structural biology, enzymology, phylogenetics, genomics and proteomics. Hence an al- gorithm (Finding All Internal Repeats, FAIR) has been designed utilizing the concepts of dynamic pro- graming to identify the repeats. The proposed algorithm is a faster and more efficient method to detect internal sequence repeats in both protein and nucleotide se- quences, than those found in the literature. The algo- rithm has been implemented in C++ and a web-based computing engine, IdentSeek, has been developed to make FAIR accessible to the scientific community. IdentSeek produces a clear, detailed result (including the location of the repeat in the sequence and its length), which can be accessed through the world wide web at the URL http://bioserver1.physics.iisc.ernet.in/ident/
Article
A key step in rational vaccine development is to understand how antigens are recognized by their receptors. Several crystal structures of MHC/HLA molecules are now available. We report a structural bioinformatics study of peptide-HLA complexes to derive features that generate recognition specificity, useful for guiding the design process.
Article
Full-text available
This paper explores the convergence of nonlinear conjugate gradient methods without restarts, and with practical line searches. The analysis covers two classes of methods that are globally convergent on smooth, nonconvex functions. Some properties of the Fletcher-Reeves method play an important role in the first family, whereas the second family shares an important property with the Polak-Ribiere method. Numerical experiments are presented.
Article
A novel and robust automated docking method that predicts the bound conformations of flexible ligands to macromolecular targets has been developed and tested, in combination with a new scoring function that estimates the free energy change upon binding. Interestingly, this method applies a Lamarckian model of genetics, in which environmental adaptations of an individual's phenotype are reverse transcribed into its genotype and become heritable traits (sic). We consider three search methods, Monte Carlo simulated annealing, a traditional genetic algorithm, and the Lamarckian genetic algorithm, and compare their performance in dockings of seven protein–ligand test systems having known three-dimensional structure. We show that both the traditional and Lamarckian genetic algorithms can handle ligands with more degrees of freedom than the simulated annealing method used in earlier versions of AUTODOCK, and that the Lamarckian genetic algorithm is the most efficient, reliable, and successful of the three. The empirical free energy function was calibrated using a set of 30 structurally known protein–ligand complexes with experimentally determined binding constants. Linear regression analysis of the observed binding constants in terms of a wide variety of structure-derived molecular properties was performed. The final model had a residual standard error of 9.11 kJ mol⁻¹ (2.177 kcal mol⁻¹) and was chosen as the new energy function. The new search methods and empirical free energy function are available in AUTODOCK, version 3.0. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1639–1662, 1998
Article
PASS (Putative Active Sites with Spheres) is a simple computational tool that uses geometry to characterize regions of buried volume in proteins and to identify positions likely to represent binding sites based upon the size, shape, and burial extent of these volumes. Its utility as a predictive tool for binding site identification is tested by predicting known binding sites of proteins in the PDB using both complexed macromolecules and their corresponding apo-protein structures. The results indicate that PASS can serve as a front-end to fast docking. The main utility of PASS lies in the fact that it can analyze a moderate-size protein (∼30 kDa) in under 20 s, which makes it suitable for interactive molecular modeling, protein database analysis, and aggressive virtual screening efforts. As a modeling tool, PASS (i) rapidly identifies favorable regions of the protein surface, (ii) simplifies visualization of residues modulating binding in these regions, and (iii) provides a means of directly visualizing buried volume, which is often inferred indirectly from curvature in a surface representation. PASS produces output in the form of standard PDB files, which are suitable for any modeling package, and provides script files to simplify visualization in Cerius2®, InsightII®, MOE®, Quanta®, RasMol®, and Sybyl®. PASS is freely available to all.
Article
Thesensitivity ofthecommonlyusedprogressive multiple sequence alignment methodhasbeengreatly improved forthealignment ofdivergent protein sequences. Firstly, individual weights areassigned to eachsequence inapartial alignment inorder todown- weightnear-duplicate sequences andup-weight the mostdivergent ones. Secondly, aminoacid substitution matrices arevaried atdifferent alignment stages according tothedivergence ofthesequences tobe aligned. Thirdly, residue-specific gappenalties and locally reduced gappenalties inhydrophilic regions encourage newgapsinpotential loopregions rather thanregular secondary structure. Fourthly, positions inearly alignments wheregapshavebeenopened receive locally reduced gappenalties toencourage the opening upofnewgapsatthesepositions. These modifications areincorporated intoanewprogram, CLUSTALW whichisfreely available.
Article
Volume is a fundamental physical property of molecules that is important in understanding their structure, function, and interactions. Present methods for computing volumes of macromolecules from crystallographically determined atomic coordinates introduce numerical errors that are, in the case of highly refined protein structures, larger than the experimental errors in the determination of the atomic coordinates. In order to obtain the maximum benefit from this high-quality experimental data it is necessary to develop a volume-computation method whose numerical error is significantly less than the experimental error. Such a method is presented here. The molecule is modeled as a static collection of hard spheres which completely exclude a spherical probe representing a solvent molecule, van der Waals volumes are computed exactly, and solvent-excluded volumes are computed with an error of about 0.01%. The method's accuracy makes it particularly useful for comparing three-dimensional structures of a macromolecule in slightly differing conformations. Causes of such differences include temperature, oxidation state, presence of ligands, crystal form, and X-ray crystallographic refinement technique. Molecular volume changes during energy minimization, molecular dynamics simulation, and X-ray refinement can be monitored. This approach should also be of general utility in measuring the volumes of packing defects in protein interiors, ligand-binding pockets on protein surfaces, and gaps between molecules at subunit interfaces. Because the volume is defined analytically, it can be differentiated for use in energy functions.
Article
An efficient algorithm is presented for the numerical solution of the Poisson–Boltzmann equation by the finite difference method of successive over-relaxation. Improvements include the rapid estimation of the optimum relaxation parameter, reduction in number of operations per iteration, and vector-oriented array mapping. The algorithm has been incorporated into the electrostatic program DelPhi, reducing the required computing time by between one and two orders of magnitude. As a result the estimation of electrostatic effects such as solvent screening, ion distributions, and solvation energies of small solutes and biological macromolecules in solution, can be performed rapidly, and with minimal computing facilities.
Article
We apply Shanno's conjugate gradient algorithm to the problem of minimizing the potential energy function associated with molecular mechanical calculations. Shanno's algorithm is stable with respect to roundoff errors and inexact line searches and converges rapidly to a minimum. Equally important, this algorithm can improve the rate of convergence to a minimum by a factor of 5 relative to Fletcher-Reeves or Polak-Ribière minimizers when used within the molecular mechanics package AMBER. Comparable improvements are found for a limited number of simulations when the Polak-Ribière direction vector is incorporated into the Shanno algorithm.