About
441
Publications
90,456
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
34,114
Citations
Publications
Publications (441)
Skin sensitization is a significant concern for chemical safety assessments. Traditional animal assays often fail to predict human responses accurately, and ethical constraints limit the collection of human data, necessitating a need for reliable in silico models of skin sensitization prediction. This study introduces HuSSPred, an in silico tool ba...
Recent advances in DNA-encoded library (DEL) screening have created bioactivity datasets containing billions of molecules, unlocking new opportunities for machine learning (ML) in drug discovery. However, most ultra-large DEL libraries are proprietary, limiting the advancement of ML tools for big chemical data analytics and hindering the democratiz...
Knowledge graphs (KGs) represent connections and relationships between real-world entities. We propose a link prediction framework for KGs named Enrichment-Driven GrAph Reasoner (EDGAR), which infers new edges by mining entity-local rules. This approach leverages enrichment analysis, a well-established statistical method used to identify mechanisms...
Over the past several decades, reducing, refining, and replacing animal testing (three R’s) has been a prominent goal in chemical toxicology.1 The STopTox (Systemic and Topical chemical Toxicity) platform was developed for this objective as an innovative in-silico alternative to conventional animal testing for acute systemic and topical toxicity te...
Helicases have emerged as promising targets for the development of antiviral drugs; however, the family remains largely undrugged. To support the focused development of viral helicase inhibitors we identified, collected, and integrated all chemogenomics data for all available helicases from the ChEMBL database. After thoroughly curating and enrichi...
The Structural Genomics Consortium is an international open science research organization with a focus on accelerating early-stage drug discovery, namely hit discovery and optimization. We, as many others, believe that artificial intelligence (AI) is poised to be a main accelerator in the field. The question is then how to best benefit from recent...
Heparan sulfate (HS), a sulfated polysaccharide abundant in the extracellular matrix, plays pivotal roles in various physiological and pathological processes by interacting with proteins. Investigating the binding selectivity of HS oligosaccharides to target proteins is essential, but the exhaustive inclusion of all possible oligosaccharides in mic...
Expansive Matching of Experts (EMOE) is a novel method that utilizes support-expanding, extrapolatory pseudo-labeling to improve prediction and uncertainty based rejection on out-of-distribution (OOD) points. We propose an expansive data augmentation technique that generates OOD instances in a latent space, and an empirical trial based approach to...
Treatment regimens, especially in cancer, often include more than one medicine in order to achieve durable outcomes. Identifying the optimal combination of treatments has historically been done through clinical trial and error. And for many conditions, such as pancreatic cancer, an optimal treatment protocol has remained elusive, and the best avail...
Background: Understanding potential prenatal and development toxicity hazard associated with the use of pharmaceutical and cosmetic products is an important component of women health. This hazard can be estimated from chemical structure of respective agents using Quantitative Structure-Activity Relationship (QSAR) models; however, the development o...
In deep learning for drug discovery, molecular representations are often based on sequences, known as SMILES, which allow for straightforward implementation of natural language processing methodologies, one being the sequence-to-sequence autoencoder. However, we observe that training an autoencoder solely on SMILES is insufficient to learn molecula...
Nearest neighbor-based similarity searching is a common task in chemistry, with notable use cases in drug discovery. Yet, some of the most commonly used approaches for this task still leverage a brute-force approach. In practice this can be computationally costly and overly time-consuming, due in part to the sheer size of modern chemical databases....
Structure-based virtual screening (SBVS) is a key workflow in computational drug discovery. SBVS models are assessed by measuring the enrichment of known active molecules over decoys in retrospective screens. However, the standard formula for enrichment cannot estimate model performance on very large libraries. Additionally, current screening bench...
Knowledge graphs are being increasingly used in biomedical research to link large amounts of heterogenous data and facilitate reasoning across diverse knowledge sources. Wider adoption and exploration of knowledge graphs in the biomedical research community is limited by requirements to understand the underlying graph structure in terms of entity t...
Traditional best practices for Quantitative Structure Activity Relationship (QSAR) modeling recommend dataset balancing and balanced accuracy (BA) as the key desired objective of model development. This study challenges the conventional norms by recommending the use of models with the highest positive predictive value (PPV) built for imbalanced tra...
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational...
Recent rapid expansion of make‐on‐demand, purchasable, chemical libraries comprising dozens of billions or even trillions of molecules has challenged the efficient application of traditional structure‐based virtual screening methods that rely on molecular docking. We present a novel computational methodology termed HIDDEN GEM ( HI t Discovery using...
We report the major highlights of the School of Cheminformatics in Latin America, Mexico City, November 24–25, 2022. Six lectures, one workshop, and one roundtable with four editors were presented during an online public event with speakers from academia, big pharma, and public research institutions. One thousand one hundred eighty-one students and...
In the ligand prediction category of CASP15, the challenge was to predict the positions and conformations of small molecules binding to proteins that were provided as amino acid sequences or as models generated by the AlphaFold2 program. For most targets, we used our template-based ligand docking program ClusPro ligTBM, also implemented as a public...
Hits from high-throughput screening (HTS) of chemical libraries are often false positives due to their interference with assay detection technology. In response, we generated the largest publicly available library of chemical liabilities and developed "Liability Predictor," a free web tool to predict HTS artifacts. More specifically, we generated,...
COVID-19 vaccines have been instrumental tools in the fight against SARS-CoV-2 helping to reduce disease severity and mortality. At the same time, just like any other therapeutic, COVID-19 vaccines were associated with adverse events. Women have reported menstrual cycle irregularity after receiving COVID-19 vaccines, and this led to renewed fears c...
We introduce STOPLIGHT, a web portal to assist medicinal chemists in prioritizing hits from screening campaigns and selection of compounds for optimization. STOPLIGHT incorporates services to assess 6 physiochemical and structural properties, 6 assay liabilities, and 11 pharmacokinetic properties for any small molecule represented by its SMILES str...
Understanding the origins of past and present viral epidemics is critical in preparing for future outbreaks. Many viruses, including SARS-CoV-2, have led to significant consequences not only due to their virulence, but also because we were unprepared for their emergence. We need to learn from large amounts of data accumulated from well-studied, pas...
Abstract:
The COVID-19 pandemic, caused by the SARS-CoV-2 virus, exposed gaps in our nation's preparedness for both the rapid development of antiviral drugs and the broader public health response. The high transmissibility of this novel pathogenic respiratory virus underscored the need for broad-spectrum antiviral drug development. The Rapidly Emer...
Molecular docking aims to predict the 3D pose of a small molecule in a protein binding site. Traditional docking methods predict ligand poses by minimizing a physics-inspired scoring function. Recently, a diffusion model has been proposed that iteratively refines a ligand pose. We combine these two approaches by training a pose scoring function in...
Molecular docking aims to predict the 3D pose of a small molecule in a protein binding site. Traditional docking methods predict ligand poses by minimizing a physics-inspired scoring function. Recently, a diffusion model has been proposed that iteratively refines a ligand pose. We combine these two approaches by training a pose scoring function in...
Diseases caused by new viruses cost thousands if not millions of human lives and trillions of dollars. We have identified, collected, curated, and integrated all chemogenomics data from ChEMBL for 13 emerging viruses that hold the greatest potential threat to global human health. By identifying and solving several challenges related to data annotat...
Glycogen Synthase Kinase-3 beta (GSK-3β) is a validated target-enzyme associated with Alzheimer’s Disease (AD). Usage of allosteric inhibitors of this enzyme represents a valid and promising therapeutic strategy due to their selective and subtle modulation, with a low probability of producing side effects. Nonetheless, only a few GSK-3β allosteric...
In the wake of recent COVID-19 pandemics scientists around the world rushed to deliver numerous CADD (Computer-Aided Drug Discovery) methods and tools that could be reliably used to discover novel drug candidates against the SARS-CoV-2 virus. With that, there emerged a trend of a significant democratization of CADD that contributed to the rapid dev...
Recent attempts at utilizing deep learning for structure-based virtual screening have focused on training models to predict binding affinity from protein-ligand complexes with known crystal structures. The PDBbind dataset is the current standard for training such models, but its small size (less than 20K binding affinity measurements) leads to mode...
Recent attempts at utilizing deep learning for structure-based virtual screening have focused on training models to predict binding affinity from protein-ligand complexes with known crystal structures. The PDBbind dataset is the current standard for training such models, but its small size (less than 20K binding affinity measurements) leads to mode...
Deep generative neural networks have been used increasingly in computational chemistry for de novo design of molecules with desired properties. Many deep learning approaches employ reinforcement learning for optimizing the target properties of the generated molecules. However, the success of this approach is often hampered by the problem of sparse...
Exogenous metal particles and ions from implant devices are known to cause severe toxic events with symptoms ranging from adverse local tissue reactions to systemic toxicities, potentially leading to the development of cancers, heart conditions, and neurological disorders. Toxicity mechanisms, also known as Adverse Outcome Pathways (AOPs), that exp...
In the United States, a pre-market regulatory submission for any medical device that comes into contact with either a patient or the clinical practitioner must include an adequate toxicity evaluation of chemical substances that can be released from the device during its intended use. These substances, also referred to as extractables and leachables...
Diseases caused by new viruses costs thousands if not millions of human lives and trillions of dollars in damage to the global economy. Despite the rapid development of vaccines for SARS-CoV-2, the lack of small molecule antiviral drugs that work against multiple viral families (broad-spectrum antivirals; BSAs) has left the entire world human popul...
COVID-19 vaccines have been instrumental tools in reducing the impact of SARS-CoV-2 infections around the world by preventing 80% to 90% of hospitalizations and deaths from reinfection, in addition to preventing 40% to 65% of symptomatic illnesses. However, the simultaneous large-scale vaccination of the global population will indubitably unveil he...
The COVID-19 pandemic has had enormous health, economic, and social consequences. Vaccines have been successful in reducing rates of infection and hospitalization, but there is still a need for acute treatment of the disease. We investigate whether compounds that bind the human angiotensin-converting enzyme 2 (ACE2) protein can decrease SARS-CoV-2...
Coronaviruses are a class of single-stranded, positive-sense RNA viruses that have caused three major outbreaks over the past two decades: Middle East respiratory syndrome–related coronavirus (MERS-CoV), severe acute respiratory syndrome coronavirus (SARS-CoV), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). All outbreaks have bee...
Glycogen Synthase Kinase-3 beta (GSK-3β) is an enzyme playing a crucial role in Alzheimer’s disease by regulating key neuropathological features in the central nervous system. Allosteric inhibitors of this kinase have been validated as a promising therapeutic option due to their selective and subtle modulation, lowering the chance of producing side...
The COVID-19 pandemic has had enormous health, economic, and social consequences. Vaccines have been successful in reducing rates of infection and hospitalization, but there is still a need for an acute treatment for the disease. We investigate whether compounds that bind the human ACE2 protein can interrupt SARS-CoV-2 replication without damaging...
Coronaviruses are a class of single-stranded, positive-sense RNA viruses that have caused three notable outbreaks over the past two decades: Middle East respiratory syndrome–related coronavirus (MERS-CoV), severe acute respiratory syndrome coronavirus (SARS-CoV), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). All outbreaks have b...
Deep learning has disrupted nearly every field of research, including those of direct importance to drug discovery, such as medicinal chemistry and pharmacology. This revolution has largely been attributed to the unprecedented advances in highly parallelizable graphics processing units (GPUs) and the development of GPU-enabled algorithms. In this R...
Objective
Social media mining may provide surprising information about unknown effects of drugs. We endeavored to uncover such unknown drug-disease relationships by text mining of audio record transcripts from the popular NPR show, The People’s Pharmacy.
Materials and Methods
We used Google Cloud to transcribe episodes of the NPR podcast into text...
Accurate prediction of binding poses is crucial to structure‐based drug design. We employ two powerful artificial intelligence (AI) approaches, data‐mining and machine‐learning, to design artificial neural network (ANN) based pose‐scoring function. It is a simple machine‐learning‐based statistical function that employs frequent geometric and chemic...
Here, we propose a broad concept of ‘Clinical Outcome Pathways’ (COPs), which are defined as a series of key molecular and cellular events that underlie therapeutic effects of drug molecules. We formalize COPs as a chain of the following events: molecular initiating event (MIE) → intermediate event(s) → clinical outcome. We illustrate the concept w...
Odorants are typically classified by specially trained individuals using subjective verbal scent descriptors. Herein, we used natural language processing to develop standardized semantic profiles of mono-molecular odorants. We have (i) curated and integrated scent perception data for mono-molecular odorants from 4 online sources; (ii) represented v...
Recently, we have introduced fast-calculated and reliable statistical criteria to estimate whether a predictive QSAR model can be built for a given chemical dataset. These modelability criteria were successfully applied to more than 100 datasets with a binary response variable. In this study, we have extended the modelability approach to datasets w...
Safety evaluation for medical devices includes the toxicity assessment of chemicals used in device manufacturing, cleansing and/or sterilization that may leach into a patient. According to international standards on biocompatibility assessments (ISO 10993), chemicals that could be released from medical devices should be evaluated for their potentia...
Eye irritation and corrosion are fundamental considerations in developing chemicals to be used in or near the eye, from cleaning products to ophthalmic solutions. Unfortunately, animal testing is currently the standard method to identify compounds that cause eye irritation or corrosion. Yet, there is growing pressure on the part of regulatory agenc...
The COVID-19 pandemic has catalyzed a widespread effort to identify drug candidates and biological targets of relevance to SARS-COV-2 infection, which resulted in large numbers of publications on this subject. We have built the COVID-19 Knowledge Extractor (COKE), a web application to extract, curate, and annotate essential drug-target relationship...
Myocarditis and pericarditis have been linked recently to COVID-19 vaccines without exploring the underlying mechanisms, or compared to cardiac adverse events post-non-COVID-19 vaccines. We introduce an informatics approach to study post-vaccine adverse events on the systems biology level to aid the prioritization of effective preventive measures a...
The conventional drug discovery pipeline has proven to be unsustainable for rare diseases. Herein, we discuss recent advances in biomedical knowledge mining applied to discovering therapeutics for rare diseases. We summarize current chemogenomics data of relevance to rare diseases and provide a perspective on the effectiveness of machine learning (...
New Approach Methodologies (NAMs) that employ artificial intelligence (AI) for predicting adverse effects of chemicals have generated optimistic expectations as alternatives to animal testing. However, the major underappreciated challenge in developing robust and predictive AI models is the impact of the quality of the input data on the model accur...
COVID-19 has resulted in huge numbers of infections and deaths worldwide and brought the most severe disruptions to societies and economies since the Great Depression. Massive experimental and computational research effort to understand and characterize the disease and rapidly develop diagnostics, vaccines, and drugs has emerged in response to this...
Assessment of pharmacokinetic properties of compounds is a critical step in drug discovery. Measuring hepatic stability is essential in establishing the drug accumulation and clearance in the body. Usually, this endpoint is evaluated in vivo, using rats, or in vitro, using human liver microsomes. Recently, in silico approaches been recognized as al...
We aimed to develop and validate a new graph embedding algorithm for embedding drug-disease-target networks to generate novel drug repurposing hypotheses. Our model denotes drugs, diseases and targets as subjects, predicates and objects, respectively. Each entity is represented by a multidimensional vector and the predicate is regarded as a transla...
The identification of reliable and non-invasive oncology biomarkers remains a main priority in healthcare. There are only a few biomarkers that have been approved as diagnostic for cancer. The most frequently used cancer biomarkers are derived from either biological materials or imaging data. Most cancer biomarkers suffer from a lack of high specif...
Background:
Humans are exposed to tens of thousands of chemical substances that need to be assessed for their potential toxicity. Acute systemic toxicity testing serves as the basis for regulatory hazard classification, labeling, and risk management. However, it is cost- and time-prohibitive to evaluate all new and existing chemicals using traditi...
Many laboratories working in the field of drug discovery use the ZINC database to identify and then acquire commercially available chemicals. However, finding the best deal for a given compound is often time-intensive and laborious, as the process involves searching for all vendors selling the desired compound, comparing prices, and interacting wit...
p>Deep generative neural networks have been used increasingly in computational chemistry for de novo design of molecules with desired properties. Many deep learning approaches employ reinforcement learning for optimizing the target properties of the generated molecules. However, the success of this approach is often hampered by the problem of spars...
Computational methods to predict molecular properties regarding safety and toxicology represent alternative approaches to expedite drug development, screen environmental chemicals, and thus significantly reduce associated time and costs. There is a strong need and interest in the development of computational methods that yield reliable predictions...
Dear colleagues worldwide, we are glad to invite you to MOL2NET-07, International Conference on Multidisciplinary Sciences, ISSN: 2624-5078, MDPI SciForum, Basel, Switzerland, 2021. MOL2NET is an international conference formed by an association of several inter-university tansatlantic workshops or sessions. These workshops are chaired by one North...
Deep learning models have demonstrated outstanding results in many data-rich areas of research, such as computer vision and natural language processing. Currently, there is a rise of deep learning in computational chemistry and materials informatics, where deep learning could be effectively applied in modeling the relationship between chemical stru...
BACKGROUND
Knowledge graphs are a common form of knowledge representation in biomedicine and many other fields. We developed an open biomedical knowledge graph–based system termed Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways, or ROBOKOP. ROBOKOP consists of both a front-end user interface and a back-end knowledge graph. T...
Background
Knowledge graphs are a common form of knowledge representation in biomedicine and many other fields. We developed an open biomedical knowledge graph–based system termed Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways (ROBOKOP). ROBOKOP consists of both a front-end user interface and a back-end knowledge graph. The...
The ability of epigenetic markers to affect genome function has enabled transformative changes in drug discovery, especially in cancer and other emerging therapeutic areas. Concordant with the introduction of the term ‘epi-informatics’, the size of the epigenetically relevant chemical space has grown substantially and so did the number of applicati...
p> Objective: The COVID-19 pandemic has catalyzed a widespread effort to identify drug candidates and biological targets of relevance to SARS-COV-2 infection, which resulted in large numbers of publications on this subject. We have built the CO VID-19 K nowledge E xtractor (COKE), a web application to extract, curate, and annotate essential drug-ta...