PreprintPDF Available

Abstract

Information regarding pathways through voids in biomolecules and their roles in ligand transport is critical to our understanding of the function of many biomolecules. Recently, the advent of high-throughput molecular dynamics simulations has enabled the study of these pathways, and of rare transport events. However, the scale and intricacy of the data produced requires dedicated tools in order to conduct analyses efficiently and without excessive demand on users. To fill this gap, we developed the TransportTools, which allows the investigation of pathways and their utilization across large, simulated datasets. TransportTools also facilitates the development of custom-made analyses. TransportTools is implemented in Python3 and distributed as pip and conda packages. The source code is available at https://github.com/labbit-eu/transport_tools.
TransportTools: a library for high-throughput analyses of internal
voids in biomolecules and ligand transport through them
Jan Brezovsky1,2,*, Aravind Selvaram Thirunavukarasu1,2, Bartlomiej Surpeta1,2, Carlos Eduardo
Sequeiros-Borja1,2, Nishita Mandal1,2, Dheeraj Kumar Sarkar1,2, Cedrix J. Dongmo
Foumthuim1,2 and Nikhil Agrawal1,2
1Laboratory of Biomolecular Interactions and Transport, Department of Gene Expression, Institute of Molecular Biology and
Biotechnology, Faculty of Biology, Adam Mickiewicz University, 61-614 Poznan, Poland, and 2International Institute of
Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland.
*To whom correspondence should be addressed.
Abstract
Information regarding pathways through voids in biomolecules and their roles in ligand transport is
critical to our understanding of the function of many biomolecules. Recently, the advent of high-
throughput molecular dynamics simulations has enabled the study of these pathways, and of rare
transport events. However, the scale and intricacy of the data produced requires dedicated tools in
order to conduct analyses efficiently and without excessive demand on users. To fill this gap, we
developed the TransportTools, which allows the investigation of pathways and their utilization across
large, simulated datasets. TransportTools also facilitates the development of custom-made analyses.
TransportTools is implemented in Python3 and distributed as pip and conda packages. The source code
is available at https://github.com/labbit-eu/transport_tools.
1. Introduction
At any moment, living systems contain thousands of small organic molecules that need to arrive at
their sites of action to exert their function. The transport of these molecules around the cell (and
beyond) is governed primarily by channels and tunnels (henceforth referred to as ‘pathways’) formed
from the internal voids of biomolecules 1. These pathways enable the transport of ions and small
molecules between different regions, connecting inner cavities with a surface, two different cavities
with each other, or different cellular environments via transmembrane proteins. Operating as such,
the investigation of these pathways is critical to drug discovery 2 and protein engineering initiatives 3.
Since pathways are often equipped with dynamic gates 4, they are mostly transient and challenging to
study.
One of the most common approaches used to characterize these rare events of ligand
transmission via transiently open pathways is to run molecular dynamics (MD) simulations 5, analyzing
the pathway dynamics using tools like CAVER 6 or tracking ligand migration through the biomolecules
with AQUA-DUCT 7. The intensive development seen in computing hardware and sampling algorithms
.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted June 2, 2021. ; https://doi.org/10.1101/2021.06.01.445451doi: bioRxiv preprint
over recent years has led to considerable growth in the size and complexity of datasets typically
generated for a single protein system. It is not uncommon for such datasets to consist of thousands
simulations. Such high-throughput approaches, however, impose a substantial burden on researchers
in establishing the identity of the pathways observed across all simulations, determining which
pathways are used by particular ligands, and developing means of specific quantitative analyses. To
this end, we present TransportTools: a library designed to alleviate these difficulties by providing easy,
efficient access to comprehensive details on transport processes even for large-scale simulation sets
and offering an environment for the development of novel analyses and tools.
2. Features
TransportTools is available as a Python3 module distributed under the GNU General Public License
v3.0, and available via pip and conda managers as the transport_tools package for Linux. In its standard
workflow (Fig. 1 and Supplementary File 1), TransportTools utilizes outputs from CAVER and AQUA-
DUCT analyses of MD simulation, integrating their complementary insights to investigate transport
pathways and corresponding ligand migration. To achieve efficiency in such a high-throughput
regimen, raw data on pathway ensembles and ligand-transport events is first coarse-grained, and
positioned on a spherical grid. Next, TransportTools identifies relationships between pathway
ensembles from individual simulations and joins them into superclusters, to which ligand-transport
events are then assigned. Critical analysis parameters can be controlled via a configuration file. These
parameters are thoroughly explained in the user guide, which also includes a detailed walk-through
tutorial (Supplementary File 2). Aside from the ready-made workflow, the library also offers many
classes to process, manipulate, and analyze pathways and events, simplifying the production of
custom-made analyses and, hopefully, stimulating further development of new packages
(Supplementary File 3).
Outputs: The main results generated by TransportTools are presented as a set of tables stored in
text files. These contain data on the composition of pathway superclusters, on their geometrical
properties and utilization by transport events, and on critical protein residues. Using generated scripts,
the spatial representation of superclusters and assigned events can be visualized in PyMOL 8. All results
can be refined using various filters and split by individual simulation or by user-defined groups to
facilitate their convenient comparison.
Performance and limitations: The performance of TransportTools was analyzed on three datasets
of 50 simulations of up to 500 residue-long enzymes with different accessibilities of their active sites.
This resulted in the detection of up to 50,000 water-transport events, which were processed within
221 hours on a standard workstation (Supplementary File 4). TransportTools inherits the limitations
.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted June 2, 2021. ; https://doi.org/10.1101/2021.06.01.445451doi: bioRxiv preprint
of the CAVER and AQUA-DUCT packages; their descriptions of pathway geometries and the definitions
of their clusters (see Section 2.2 of Supplementary File 2 for best practice guidelines). When MD
trajectories are utilized directly, usage is restricted to file formats supported by the pytraj package 9.
Fig. 1. Schematic of a standard TransportTools analysis workflow.
Use cases: To illustrate the applicability of TransportTools, we applied it to the analysis of three
representative examples of biological problems connected with ligand transport using an established
model system enzymes DhaA and LinB from the haloalkane dehalogenase family 10,11. First, we
analyzed 15 simulations of DhaA in an effort to discover rare transient tunnels and their usage by
water molecules (Supplementary File 5). Next, we derived an understanding of the effect of mutations
on the system by contrasting simulations of LinB wild-type, LinB32 mutant with a closed primary
tunnel, and LinB86 mutant with a de novo created tunnel (Supplementary File 6). Finally, we studied
the substrate molecule selectivity of the pathways leading to the active site of LinB86 in almost 600
simulations (Supplementary File 7).
.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted June 2, 2021. ; https://doi.org/10.1101/2021.06.01.445451doi: bioRxiv preprint
3. Conclusions
The TransportTools library provides users with access to (i) efficient analyses of transport pathways
across extensive MD simulations, including those originating from massively parallel calculations or
very long simulations; (ii) integrated data regarding transport pathways and their actual utilization by
small molecules; and (iii) rigorous comparisons of transport processes under different settings, e.g. by
contrasting transport in an original system against the same system perturbed by mutations, different
solvents, or bound ligands.
Acknowledgments
Computations were performed at the Poznan Supercomputing and Networking Center.
Funding
This work was supported by the National Science Centre, Poland [grant no. 2017/25/B/NZ1/01307 and
2017/26/E/NZ1/00548], by POWER projects [POWR.03.02.00-00-I022/16 and POWR.03.02.00-00-I006/17], and
by a grant of the Dean of Faculty of Biology, AMU [GDWB-05/2020].
Conflict of Interest: none declared.
References
(1) Kingsley, L. J.; Lill, M. A. Substrate Tunnels in Enzymes: Structure-Function Relationships and
Computational Methodology. Proteins 2015, 83 (4), 599611. https://doi.org/10.1002/prot.24772.
(2) Marques, S. M.; Daniel, L.; Buryska, T.; Prokop, Z.; Brezovsky, J.; Damborsky, J. Enzyme Tunnels and
Gates As Relevant Targets in Drug Design. Med. Res. Rev. 2017, 37 (5), 10951139.
https://doi.org/10.1002/med.21430.
(3) Kokkonen, P.; Bednar, D.; Pinto, G.; Prokop, Z.; Damborsky, J. Engineering Enzyme Access Tunnels.
Biotechnol. Adv. 2019, 37 (6), 107386. https://doi.org/10.1016/j.biotechadv.2019.04.008.
(4) Gora, A.; Brezovsky, J.; Damborsky, J. Gates of Enzymes. Chem. Rev. 2013, 113 (8), 58715923.
https://doi.org/10.1021/cr300384w.
(5) Decherchi, S.; Cavalli, A. Thermodynamics and Kinetics of Drug-Target Binding by Molecular Simulation.
Chem. Rev. 2020, 120 (23), 1278812833. https://doi.org/10.1021/acs.chemrev.0c00534.
(6) Jurcik, A.; Bednar, D.; Byska, J.; Marques, S. M.; Furmanova, K.; Daniel, L.; Kokkonen, P.; Brezovsky, J.;
Strnad, O.; Stourac, J.; Pavelka, A.; Manak, M.; Damborsky, J.; Kozlikova, B. CAVER Analyst 2.0: Analysis
and Visualization of Channels and Tunnels in Protein Structures and Molecular Dynamics Trajectories.
Bioinformatics 2018, 34 (20), 35863588. https://doi.org/10.1093/bioinformatics/bty386.
(7) Magdziarz, T.; Mitusińska, K.; Bzówka, M.; Raczyńska, A.; Stańczak, A.; Banas, M.; Bagrowska, W.; Góra,
A. AQUA-DUCT 1.0: Structural and Functional Analysis of Macromolecules from an Intramolecular
Voids Perspective. Bioinformatics 2020, 36 (8), 25992601.
https://doi.org/10.1093/bioinformatics/btz946.
(8) The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.; 2017.
(9) Roe, D. R.; Cheatham, T. E. PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular
Dynamics Trajectory Data. J. Chem. Theory Comput. 2013, 9 (7), 30843095.
https://doi.org/10.1021/ct400341p.
(10) Pavlova, M.; Klvana, M.; Prokop, Z.; Chaloupkova, R.; Banas, P.; Otyepka, M.; Wade, R. C.; Tsuda, M.;
Nagata, Y.; Damborsky, J. Redesigning Dehalogenase Access Tunnels as a Strategy for Degrading an
Anthropogenic Substrate. Nat. Chem. Biol. 2009, 5 (10), 727733.
https://doi.org/10.1038/nchembio.205.
(11) Brezovsky, J.; Babkova, P.; Degtjarik, O.; Fortova, A.; Gora, A.; Iermak, I.; Rezacova, P.; Dvorak, P.;
Smatanova, I. K.; Prokop, Z.; Chaloupkova, R.; Damborsky, J. Engineering a de Novo Transport Tunnel.
ACS Catal. 2016, 6 (11), 75977610. https://doi.org/10.1021/acscatal.6b02081.
.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted June 2, 2021. ; https://doi.org/10.1101/2021.06.01.445451doi: bioRxiv preprint
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Computational studies play an increasingly important role in chemistry and biophysics, mainly thanks to improvements in hardware and algorithms. In drug discovery and development, computational studies can reduce the costs and risks of bringing a new medicine to market. Computational simulations are mainly used to optimize promising new compounds by estimating their binding affinity to proteins. This is challenging due to the complexity of the simulated system. To assess the present and future value of simulation for drug discovery, we review key applications of advanced methods for sampling complex free-energy landscapes at near nonergodicity conditions and for estimating the rate coefficients of very slow processes of pharmacological interest. We outline the statistical mechanics and computational background behind this research, including methods such as steered molecular dynamics and metadynamics. We review recent applications to pharmacology and drug discovery and discuss possible guidelines for the practitioner. Recent trends in machine learning are also briefly discussed. Thanks to the rapid development of methods for characterizing and quantifying rare events, simulation's role in drug discovery is likely to expand, making it a valuable complement to experimental and clinical approaches.
Article
Full-text available
Motivation: Tunnels, pores, channels, pockets and cavities contribute to proteins architecture and performance. However, analysis and characteristics of transportation pathways and internal binding cavities are performed separately. We aimed to provide universal tool for analysis of proteins integral interior with access to detailed information on the ligands transportation phenomena and binding preferences. Results: AQUA-DUCT version 1.0 is a comprehensive method for macromolecules analysis from the intramolecular voids perspective using small ligands as a molecular probes. This version gives insight into several properties of macromolecules and facilitates protein engineering and drug design by the combination of the tracking and local mapping approach to small ligands. Availability: http://www.aquaduct.pl. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Full-text available
Engineering enzymes to degrade anthropogenic compounds efficiently is challenging. We obtained Rhodococcus rhodochrous haloalkane dehalogenase mutants with up to 32-fold higher activity than wild type toward the toxic, recalcitrant anthropogenic compound 1,2,3-trichloropropane (TCP) using a new strategy. We identified key residues in access tunnels connecting the buried active site with bulk solvent by rational design and randomized them by directed evolution. The most active mutant has large aromatic residues at two out of three randomized positions and two positions modified by site-directed mutagenesis. These changes apparently enhance activity with TCP by decreasing accessibility of the active site for water molecules, thereby promoting activated complex formation. Kinetic analyses confirmed that the mutations improved carbon-halogen bond cleavage and shifted the rate-limiting step to the release of products. Engineering access tunnels by combining computer-assisted protein design with directed evolution may be a valuable strategy for refining catalytic properties of enzymes with buried active sites.
Article
Enzymes are efficient and specific catalysts for many essential reactions in biotechnological and pharmaceutical industries. Many times, the natural enzymes do not display the catalytic efficiency, stability or specificity required for these industrial processes. The current enzyme engineering methods offer solutions to this problem, but they mainly target the buried active site where the chemical reaction takes place. Despite being many times ignored, the tunnels and channels connecting the environment with the active site are equally important for the catalytic properties of enzymes. Changes in the enzymatic tunnels and channels affect enzyme activity, specificity, promiscuity, enantioselectivity and stability. This review provides an overview of the emerging field of enzyme access tunnel engineering with case studies describing design of all the aforementioned properties. The software tools for the analysis of geometry and function of the enzymatic tunnels and channels and for the rational design of tunnel modifications will also be discussed. The combination of new software tools and enzyme engineering strategies will provide enzymes with access tunnels and channels specifically tailored for individual industrial processes.
Article
Motivation Studying the transport paths of ligands, solvents, or ions in transmembrane proteins and proteins with buried binding sites is fundamental to the understanding of their biological function. A detailed analysis of the structural features influencing the transport paths is also important for engineering proteins for biomedical and biotechnological applications. Results CAVER Analyst 2.0 is a software tool for quantitative analysis and real-time visualization of tunnels and channels in static and dynamic structures. This version provides the users with many new functions, including advanced techniques for intuitive visual inspection of the spatiotemporal behavior of tunnels and channels. Novel integrated algorithms allow an efficient analysis and data reduction in large protein structures and molecular dynamic simulations. Availability CAVER Analyst 2.0 is a multi-platform standalone Java-based application. Binaries and documentation are freely available at www.caver.cz.
Article
Many enzymes contain tunnels and gates that are essential to their function. Gates reversibly switch between open and closed conformations and thereby control the traffic of small molecules-substrates, products, ions, and solvent molecules-into and out of the enzyme's structure via molecular tunnels. Many transient tunnels and gates undoubtedly remain to be identified, and their functional roles and utility as potential drug targets have received comparatively little attention. Here, we describe a set of general concepts relating to the structural properties, function, and classification of these interesting structural features. In addition, we highlight the potential of enzyme tunnels and gates as targets for the binding of small molecules. The different types of binding that are possible and the potential pharmacological benefits of such targeting are discussed. Twelve examples of ligands bound to the tunnels and/or gates of clinically relevant enzymes are used to illustrate the different binding modes and to explain some new strategies for drug design. Such strategies could potentially help to overcome some of the problems facing medicinal chemists and lead to the discovery of more effective drugs.
Article
Transport of ligands between buried active sites and bulk solvent is a key step in the catalytic cycle of many enzymes. Absence of evolutionary optimized transport tunnels is an important barrier limiting the efficiency of biocatalysts prepared by computational design. Creating a structurally defined and functional ?hole? into the protein represents an engineering challenge. Here we describe the computational design and directed evolution of a de novo transport tunnel in haloalkane dehalogenase. Mutants with a blocked native tunnel and newly opened auxiliary tunnel in a distinct part of the structure showed dramatically modified properties. The mutants with blocked tunnels acquired specificity never observed with native family members, up to 32-times increased substrate inhibition and 17-times reduced catalytic rates. Opening of the auxiliary tunnel resulted in specificity and substrate inhibition similar to the native enzyme, and the most proficient haloalkane dehalogenase reported to date (kcat = 57 s-1 with 1,2-dibromoethane at 37oC and pH=8.6). Crystallographic analysis and molecular dynamics simulations confirmed successful introduction of structur-ally defined and functional transport tunnel. Our study demonstrates that whereas we can open the transport tunnels with reasonable proficiency, we cannot accurately predict the effects of such change on the catalytic properties. We propose that one way to increase efficiency of an enzyme is the direct its substrates and products into spatially distinct tunnels. The results clearly show the benefits of enzymes with de novo transport tunnels and we anticipate that this engineering strategy will facilitate creation of a wide range of useful biocatalysts.
Article
In enzymes, the active site is the location where incoming substrates are chemically converted to products. In some enzymes, this site is deeply buried within the core of the protein and in order to access the active site, substrates must pass through the body of the protein via a tunnel. In many systems, these tunnels act as filters and have been found to influence both substrate specificity and catalytic mechanism. Identifying and understanding how these tunnels exert such control has been of growing interest over the past several years due to implications in fields such as protein engineering and drug design. This growing interest has spurred the development of several computational methods to identify and analyze tunnels and how ligands migrate through these tunnels. The goal of this review is to outline how tunnels influence substrate specificity and catalytic efficiency in enzymes with tunnels and to provide a brief summary of the computational tools used to identify and evaluate these tunnels. This article is protected by copyright. All rights reserved. © 2015 Wiley Periodicals, Inc.
Article
We describe PTRAJ and its successor CPPTRAJ, two complementary, portable, and freely available computer programs for the analysis and processing of time series of three-dimensional atomic positions (i.e., coordinate trajectories) and the data therein derived. Common tools include the ability to manipulate the data to convert among trajectory formats, process groups of trajectories generated with ensemble methods (e.g., replica exchange molecular dynamics), image with periodic boundary conditions, create average structures, strip subsets of the system, and perform calculations such as RMS fitting, measuring distances, B-factors, radii of gyration, radial distribution functions, and time correlations, among other actions and analyses. Both the PTRAJ and CPPTRAJ programs and source code are freely available under the GNU General Public License version 3 and are currently distributed within the AmberTools 12 suite of support programs that make up part of the Amber package of computer programs (see http://ambermd.org). This overview describes the general design, features, and history of these two programs, as well as algorithmic improvements and new features available in CPPTRAJ.
Article
The dynamic motion of enzymes during catalytic events is one of the many aspects of protein chemistry that are currently insufficiently well understood. On one hand, proteins need to have well-defined and organized structures in order to maintain stable functionality in the intracellular environment. On the other hand, some degree of flexibility is often required for catalytic activity. Molecular dynamics simulations have provided key insights into the importance of protein dynamics in catalysis, such as the observation of substrate access and product exit pathways that cannot be identified by inspecting crystal structures. Spatial localization of the hydrophobic and hydrophilic regions within the structure of a protein is important in maintaining its proper fold and can also be crucial for catalytic function. The various steps of an enzymatic reaction may require different environments.