Alexander Tropsha

Alexander Tropsha
University of North Carolina at Chapel Hill | UNC · Eshelman School of Pharmacy

About

441
Publications
90,456
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
34,114
Citations

Publications

Publications (441)
Article
Full-text available
Skin sensitization is a significant concern for chemical safety assessments. Traditional animal assays often fail to predict human responses accurately, and ethical constraints limit the collection of human data, necessitating a need for reliable in silico models of skin sensitization prediction. This study introduces HuSSPred, an in silico tool ba...
Preprint
Full-text available
Recent advances in DNA-encoded library (DEL) screening have created bioactivity datasets containing billions of molecules, unlocking new opportunities for machine learning (ML) in drug discovery. However, most ultra-large DEL libraries are proprietary, limiting the advancement of ML tools for big chemical data analytics and hindering the democratiz...
Preprint
Knowledge graphs (KGs) represent connections and relationships between real-world entities. We propose a link prediction framework for KGs named Enrichment-Driven GrAph Reasoner (EDGAR), which infers new edges by mining entity-local rules. This approach leverages enrichment analysis, a well-established statistical method used to identify mechanisms...
Preprint
Over the past several decades, reducing, refining, and replacing animal testing (three R’s) has been a prominent goal in chemical toxicology.1 The STopTox (Systemic and Topical chemical Toxicity) platform was developed for this objective as an innovative in-silico alternative to conventional animal testing for acute systemic and topical toxicity te...
Preprint
Full-text available
Helicases have emerged as promising targets for the development of antiviral drugs; however, the family remains largely undrugged. To support the focused development of viral helicase inhibitors we identified, collected, and integrated all chemogenomics data for all available helicases from the ChEMBL database. After thoroughly curating and enrichi...
Article
Full-text available
The Structural Genomics Consortium is an international open science research organization with a focus on accelerating early-stage drug discovery, namely hit discovery and optimization. We, as many others, believe that artificial intelligence (AI) is poised to be a main accelerator in the field. The question is then how to best benefit from recent...
Article
Heparan sulfate (HS), a sulfated polysaccharide abundant in the extracellular matrix, plays pivotal roles in various physiological and pathological processes by interacting with proteins. Investigating the binding selectivity of HS oligosaccharides to target proteins is essential, but the exhaustive inclusion of all possible oligosaccharides in mic...
Preprint
Full-text available
Expansive Matching of Experts (EMOE) is a novel method that utilizes support-expanding, extrapolatory pseudo-labeling to improve prediction and uncertainty based rejection on out-of-distribution (OOD) points. We propose an expansive data augmentation technique that generates OOD instances in a latent space, and an empirical trial based approach to...
Preprint
Treatment regimens, especially in cancer, often include more than one medicine in order to achieve durable outcomes. Identifying the optimal combination of treatments has historically been done through clinical trial and error. And for many conditions, such as pancreatic cancer, an optimal treatment protocol has remained elusive, and the best avail...
Preprint
Background: Understanding potential prenatal and development toxicity hazard associated with the use of pharmaceutical and cosmetic products is an important component of women health. This hazard can be estimated from chemical structure of respective agents using Quantitative Structure-Activity Relationship (QSAR) models; however, the development o...
Article
In deep learning for drug discovery, molecular representations are often based on sequences, known as SMILES, which allow for straightforward implementation of natural language processing methodologies, one being the sequence-to-sequence autoencoder. However, we observe that training an autoencoder solely on SMILES is insufficient to learn molecula...
Chapter
Nearest neighbor-based similarity searching is a common task in chemistry, with notable use cases in drug discovery. Yet, some of the most commonly used approaches for this task still leverage a brute-force approach. In practice this can be computationally costly and overly time-consuming, due in part to the sheer size of modern chemical databases....
Article
Structure-based virtual screening (SBVS) is a key workflow in computational drug discovery. SBVS models are assessed by measuring the enrichment of known active molecules over decoys in retrospective screens. However, the standard formula for enrichment cannot estimate model performance on very large libraries. Additionally, current screening bench...
Article
Full-text available
Knowledge graphs are being increasingly used in biomedical research to link large amounts of heterogenous data and facilitate reasoning across diverse knowledge sources. Wider adoption and exploration of knowledge graphs in the biomedical research community is limited by requirements to understand the underlying graph structure in terms of entity t...
Preprint
Traditional best practices for Quantitative Structure Activity Relationship (QSAR) modeling recommend dataset balancing and balanced accuracy (BA) as the key desired objective of model development. This study challenges the conventional norms by recommending the use of models with the highest positive predictive value (PPV) built for imbalanced tra...
Article
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational...
Article
Recent rapid expansion of make‐on‐demand, purchasable, chemical libraries comprising dozens of billions or even trillions of molecules has challenged the efficient application of traditional structure‐based virtual screening methods that rely on molecular docking. We present a novel computational methodology termed HIDDEN GEM ( HI t Discovery using...
Article
Full-text available
We report the major highlights of the School of Cheminformatics in Latin America, Mexico City, November 24–25, 2022. Six lectures, one workshop, and one roundtable with four editors were presented during an online public event with speakers from academia, big pharma, and public research institutions. One thousand one hundred eighty-one students and...
Article
In the ligand prediction category of CASP15, the challenge was to predict the positions and conformations of small molecules binding to proteins that were provided as amino acid sequences or as models generated by the AlphaFold2 program. For most targets, we used our template-based ligand docking program ClusPro ligTBM, also implemented as a public...
Article
Hits from high-throughput screening (HTS) of chemical libraries are often false positives due to their interference with assay detection technology. In response, we generated the largest publicly available library of chemical liabilities and developed "Liability Predictor," a free web tool to predict HTS artifacts. More specifically, we generated,...
Article
Full-text available
COVID-19 vaccines have been instrumental tools in the fight against SARS-CoV-2 helping to reduce disease severity and mortality. At the same time, just like any other therapeutic, COVID-19 vaccines were associated with adverse events. Women have reported menstrual cycle irregularity after receiving COVID-19 vaccines, and this led to renewed fears c...
Preprint
Full-text available
We introduce STOPLIGHT, a web portal to assist medicinal chemists in prioritizing hits from screening campaigns and selection of compounds for optimization. STOPLIGHT incorporates services to assess 6 physiochemical and structural properties, 6 assay liabilities, and 11 pharmacokinetic properties for any small molecule represented by its SMILES str...
Article
Understanding the origins of past and present viral epidemics is critical in preparing for future outbreaks. Many viruses, including SARS-CoV-2, have led to significant consequences not only due to their virulence, but also because we were unprepared for their emergence. We need to learn from large amounts of data accumulated from well-studied, pas...
Poster
Full-text available
Abstract: The COVID-19 pandemic, caused by the SARS-CoV-2 virus, exposed gaps in our nation's preparedness for both the rapid development of antiviral drugs and the broader public health response. The high transmissibility of this novel pathogenic respiratory virus underscored the need for broad-spectrum antiviral drug development. The Rapidly Emer...
Article
Molecular docking aims to predict the 3D pose of a small molecule in a protein binding site. Traditional docking methods predict ligand poses by minimizing a physics-inspired scoring function. Recently, a diffusion model has been proposed that iteratively refines a ligand pose. We combine these two approaches by training a pose scoring function in...
Preprint
Full-text available
Molecular docking aims to predict the 3D pose of a small molecule in a protein binding site. Traditional docking methods predict ligand poses by minimizing a physics-inspired scoring function. Recently, a diffusion model has been proposed that iteratively refines a ligand pose. We combine these two approaches by training a pose scoring function in...
Article
Diseases caused by new viruses cost thousands if not millions of human lives and trillions of dollars. We have identified, collected, curated, and integrated all chemogenomics data from ChEMBL for 13 emerging viruses that hold the greatest potential threat to global human health. By identifying and solving several challenges related to data annotat...
Preprint
Full-text available
Glycogen Synthase Kinase-3 beta (GSK-3β) is a validated target-enzyme associated with Alzheimer’s Disease (AD). Usage of allosteric inhibitors of this enzyme represents a valid and promising therapeutic strategy due to their selective and subtle modulation, with a low probability of producing side effects. Nonetheless, only a few GSK-3β allosteric...
Article
In the wake of recent COVID-19 pandemics scientists around the world rushed to deliver numerous CADD (Computer-Aided Drug Discovery) methods and tools that could be reliably used to discover novel drug candidates against the SARS-CoV-2 virus. With that, there emerged a trend of a significant democratization of CADD that contributed to the rapid dev...
Preprint
Full-text available
Recent attempts at utilizing deep learning for structure-based virtual screening have focused on training models to predict binding affinity from protein-ligand complexes with known crystal structures. The PDBbind dataset is the current standard for training such models, but its small size (less than 20K binding affinity measurements) leads to mode...
Preprint
Full-text available
Recent attempts at utilizing deep learning for structure-based virtual screening have focused on training models to predict binding affinity from protein-ligand complexes with known crystal structures. The PDBbind dataset is the current standard for training such models, but its small size (less than 20K binding affinity measurements) leads to mode...
Article
Full-text available
Deep generative neural networks have been used increasingly in computational chemistry for de novo design of molecules with desired properties. Many deep learning approaches employ reinforcement learning for optimizing the target properties of the generated molecules. However, the success of this approach is often hampered by the problem of sparse...
Article
Exogenous metal particles and ions from implant devices are known to cause severe toxic events with symptoms ranging from adverse local tissue reactions to systemic toxicities, potentially leading to the development of cancers, heart conditions, and neurological disorders. Toxicity mechanisms, also known as Adverse Outcome Pathways (AOPs), that exp...
Article
In the United States, a pre-market regulatory submission for any medical device that comes into contact with either a patient or the clinical practitioner must include an adequate toxicity evaluation of chemical substances that can be released from the device during its intended use. These substances, also referred to as extractables and leachables...
Preprint
Full-text available
Diseases caused by new viruses costs thousands if not millions of human lives and trillions of dollars in damage to the global economy. Despite the rapid development of vaccines for SARS-CoV-2, the lack of small molecule antiviral drugs that work against multiple viral families (broad-spectrum antivirals; BSAs) has left the entire world human popul...
Article
Full-text available
COVID-19 vaccines have been instrumental tools in reducing the impact of SARS-CoV-2 infections around the world by preventing 80% to 90% of hospitalizations and deaths from reinfection, in addition to preventing 40% to 65% of symptomatic illnesses. However, the simultaneous large-scale vaccination of the global population will indubitably unveil he...
Article
The COVID-19 pandemic has had enormous health, economic, and social consequences. Vaccines have been successful in reducing rates of infection and hospitalization, but there is still a need for acute treatment of the disease. We investigate whether compounds that bind the human angiotensin-converting enzyme 2 (ACE2) protein can decrease SARS-CoV-2...
Article
Coronaviruses are a class of single-stranded, positive-sense RNA viruses that have caused three major outbreaks over the past two decades: Middle East respiratory syndrome–related coronavirus (MERS-CoV), severe acute respiratory syndrome coronavirus (SARS-CoV), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). All outbreaks have bee...
Poster
Glycogen Synthase Kinase-3 beta (GSK-3β) is an enzyme playing a crucial role in Alzheimer’s disease by regulating key neuropathological features in the central nervous system. Allosteric inhibitors of this kinase have been validated as a promising therapeutic option due to their selective and subtle modulation, lowering the chance of producing side...
Preprint
Full-text available
The COVID-19 pandemic has had enormous health, economic, and social consequences. Vaccines have been successful in reducing rates of infection and hospitalization, but there is still a need for an acute treatment for the disease. We investigate whether compounds that bind the human ACE2 protein can interrupt SARS-CoV-2 replication without damaging...
Preprint
Full-text available
Coronaviruses are a class of single-stranded, positive-sense RNA viruses that have caused three notable outbreaks over the past two decades: Middle East respiratory syndrome–related coronavirus (MERS-CoV), severe acute respiratory syndrome coronavirus (SARS-CoV), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). All outbreaks have b...
Article
Full-text available
Deep learning has disrupted nearly every field of research, including those of direct importance to drug discovery, such as medicinal chemistry and pharmacology. This revolution has largely been attributed to the unprecedented advances in highly parallelizable graphics processing units (GPUs) and the development of GPU-enabled algorithms. In this R...
Preprint
Full-text available
Objective Social media mining may provide surprising information about unknown effects of drugs. We endeavored to uncover such unknown drug-disease relationships by text mining of audio record transcripts from the popular NPR show, The People’s Pharmacy. Materials and Methods We used Google Cloud to transcribe episodes of the NPR podcast into text...
Article
Accurate prediction of binding poses is crucial to structure‐based drug design. We employ two powerful artificial intelligence (AI) approaches, data‐mining and machine‐learning, to design artificial neural network (ANN) based pose‐scoring function. It is a simple machine‐learning‐based statistical function that employs frequent geometric and chemic...
Article
Here, we propose a broad concept of ‘Clinical Outcome Pathways’ (COPs), which are defined as a series of key molecular and cellular events that underlie therapeutic effects of drug molecules. We formalize COPs as a chain of the following events: molecular initiating event (MIE) → intermediate event(s) → clinical outcome. We illustrate the concept w...
Preprint
Full-text available
Odorants are typically classified by specially trained individuals using subjective verbal scent descriptors. Herein, we used natural language processing to develop standardized semantic profiles of mono-molecular odorants. We have (i) curated and integrated scent perception data for mono-molecular odorants from 4 online sources; (ii) represented v...
Chapter
Recently, we have introduced fast-calculated and reliable statistical criteria to estimate whether a predictive QSAR model can be built for a given chemical dataset. These modelability criteria were successfully applied to more than 100 datasets with a binary response variable. In this study, we have extended the modelability approach to datasets w...
Preprint
Full-text available
Safety evaluation for medical devices includes the toxicity assessment of chemicals used in device manufacturing, cleansing and/or sterilization that may leach into a patient. According to international standards on biocompatibility assessments (ISO 10993), chemicals that could be released from medical devices should be evaluated for their potentia...
Article
Full-text available
Eye irritation and corrosion are fundamental considerations in developing chemicals to be used in or near the eye, from cleaning products to ophthalmic solutions. Unfortunately, animal testing is currently the standard method to identify compounds that cause eye irritation or corrosion. Yet, there is growing pressure on the part of regulatory agenc...
Article
The COVID-19 pandemic has catalyzed a widespread effort to identify drug candidates and biological targets of relevance to SARS-COV-2 infection, which resulted in large numbers of publications on this subject. We have built the COVID-19 Knowledge Extractor (COKE), a web application to extract, curate, and annotate essential drug-target relationship...
Article
Full-text available
Myocarditis and pericarditis have been linked recently to COVID-19 vaccines without exploring the underlying mechanisms, or compared to cardiac adverse events post-non-COVID-19 vaccines. We introduce an informatics approach to study post-vaccine adverse events on the systems biology level to aid the prioritization of effective preventive measures a...
Article
The conventional drug discovery pipeline has proven to be unsustainable for rare diseases. Herein, we discuss recent advances in biomedical knowledge mining applied to discovering therapeutics for rare diseases. We summarize current chemogenomics data of relevance to rare diseases and provide a perspective on the effectiveness of machine learning (...
Article
New Approach Methodologies (NAMs) that employ artificial intelligence (AI) for predicting adverse effects of chemicals have generated optimistic expectations as alternatives to animal testing. However, the major underappreciated challenge in developing robust and predictive AI models is the impact of the quality of the input data on the model accur...
Article
Full-text available
COVID-19 has resulted in huge numbers of infections and deaths worldwide and brought the most severe disruptions to societies and economies since the Great Depression. Massive experimental and computational research effort to understand and characterize the disease and rapidly develop diagnostics, vaccines, and drugs has emerged in response to this...
Poster
Assessment of pharmacokinetic properties of compounds is a critical step in drug discovery. Measuring hepatic stability is essential in establishing the drug accumulation and clearance in the body. Usually, this endpoint is evaluated in vivo, using rats, or in vitro, using human liver microsomes. Recently, in silico approaches been recognized as al...
Article
We aimed to develop and validate a new graph embedding algorithm for embedding drug-disease-target networks to generate novel drug repurposing hypotheses. Our model denotes drugs, diseases and targets as subjects, predicates and objects, respectively. Each entity is represented by a multidimensional vector and the predicate is regarded as a transla...
Article
Full-text available
The identification of reliable and non-invasive oncology biomarkers remains a main priority in healthcare. There are only a few biomarkers that have been approved as diagnostic for cancer. The most frequently used cancer biomarkers are derived from either biological materials or imaging data. Most cancer biomarkers suffer from a lack of high specif...
Article
Full-text available
Background: Humans are exposed to tens of thousands of chemical substances that need to be assessed for their potential toxicity. Acute systemic toxicity testing serves as the basis for regulatory hazard classification, labeling, and risk management. However, it is cost- and time-prohibitive to evaluate all new and existing chemicals using traditi...
Article
Many laboratories working in the field of drug discovery use the ZINC database to identify and then acquire commercially available chemicals. However, finding the best deal for a given compound is often time-intensive and laborious, as the process involves searching for all vendors selling the desired compound, comparing prices, and interacting wit...
Preprint
Full-text available
p>Deep generative neural networks have been used increasingly in computational chemistry for de novo design of molecules with desired properties. Many deep learning approaches employ reinforcement learning for optimizing the target properties of the generated molecules. However, the success of this approach is often hampered by the problem of spars...
Article
Computational methods to predict molecular properties regarding safety and toxicology represent alternative approaches to expedite drug development, screen environmental chemicals, and thus significantly reduce associated time and costs. There is a strong need and interest in the development of computational methods that yield reliable predictions...
Conference Paper
Full-text available
Dear colleagues worldwide, we are glad to invite you to MOL2NET-07, International Conference on Multidisciplinary Sciences, ISSN: 2624-5078, MDPI SciForum, Basel, Switzerland, 2021. MOL2NET is an international conference formed by an association of several inter-university tansatlantic workshops or sessions. These workshops are chaired by one North...
Article
Deep learning models have demonstrated outstanding results in many data-rich areas of research, such as computer vision and natural language processing. Currently, there is a rise of deep learning in computational chemistry and materials informatics, where deep learning could be effectively applied in modeling the relationship between chemical stru...
Preprint
BACKGROUND Knowledge graphs are a common form of knowledge representation in biomedicine and many other fields. We developed an open biomedical knowledge graph–based system termed Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways, or ROBOKOP. ROBOKOP consists of both a front-end user interface and a back-end knowledge graph. T...
Article
Full-text available
Background Knowledge graphs are a common form of knowledge representation in biomedicine and many other fields. We developed an open biomedical knowledge graph–based system termed Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways (ROBOKOP). ROBOKOP consists of both a front-end user interface and a back-end knowledge graph. The...
Article
The ability of epigenetic markers to affect genome function has enabled transformative changes in drug discovery, especially in cancer and other emerging therapeutic areas. Concordant with the introduction of the term ‘epi-informatics’, the size of the epigenetically relevant chemical space has grown substantially and so did the number of applicati...
Preprint
Full-text available
p> Objective: The COVID-19 pandemic has catalyzed a widespread effort to identify drug candidates and biological targets of relevance to SARS-COV-2 infection, which resulted in large numbers of publications on this subject. We have built the CO VID-19 K nowledge E xtractor (COKE), a web application to extract, curate, and annotate essential drug-ta...