Olaf Wiest’s research while affiliated with Université Notre Dame d'Haïti and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (297)


(A) General transfer learning workflow. (B) Pericyclic reactions and their mechanistic similarities
Performance of NERF models generated by transfer learning for predicting the outcomes of (A) Cope and Claisen rearrangements and (B) Diels–Alder reactions, as a function of (i) the chemistry of the pretrained model and (ii) the amount of training data. The models were trained using the following training datasplits of the 3289-reaction Cope and Claisen dataset or the 9537-reaction Diels–Alder dataset (training : validation : testing): 10 : 45 : 45, 40 : 30 : 30, 60 : 20 : 20, 80 : 10 : 10, 85 : 5:10. The performance of models built without pre-training are also shown for comparison. Top-1 prediction accuracies are shown and are the average of ten runs
Increase in Top-1 accuracy when pre-training on USPTO-MIT or Cope and Claisen data is compared a no pre-training approach that includes an additional 49 select examples of inverse electron demand Diels–Alder reactions on triazines and oxazoles
Effects of pre-training chemistry on prediction accuracies for NERF models where each pre-training dataset comprised 1000 reactions. The height of each bar indicates how the Top-1 accuracy compares to the baseline (non-pretrained model). (A) Models were made for the Cope and Claisen rearrangement using 328 reactions (10%) of the dataset as training. (B) Models were made for the Diels–Alder dataset also using 328 reactions (3%) of the dataset as training
UMAP of rxnfp fingerprints of USPTO-MIT, Diels–Alder, Cope and Claisen, Ene, and Nazarov reaction datasets
Improving reaction prediction through chemically aware transfer learning
  • Article
  • Full-text available

March 2025

·

14 Reads

Angus Keto

·

·

Nils Gönnheimer

·

[...]

·

Olaf Wiest

Practical applications of machine learning (ML) to new chemical domains are often hindered by data scarcity. Here we show how data gaps can be circumvented by means of transfer learning that leverages chemically relevant pre-training data. Case studies are presented in which the outcomes of two classes of pericyclic reactions are predicted: [3,3] rearrangements (Cope and Claisen rearrangements) and [4 + 2] cycloadditions (Diels–Alder reactions). Using the graph-based generative algorithm NERF, we evaluate the data efficiencies achieved with different starting models that we pre-trained on datasets of different sizes and chemical scope. We show that the greatest data efficiency is obtained when the pre-training is performed on smaller datasets of mechanistically related reactions (Diels–Alder, Cope and Claisen, Ene, and Nazarov) rather than >50× larger datasets of mechanistically unrelated reactions (USPTO-MIT). These small bespoke datasets were more efficient in both low re-training and low pre-training regimes, and are thus recommended alternatives to large diverse datasets for pre-training ML models.

Download

Artificial Intelligence in Spectroscopy: Advancing Chemistry from Prediction to Generation and Beyond

February 2025

·

96 Reads

The rapid advent of machine learning (ML) and artificial intelligence (AI) has catalyzed major transformations in chemistry, yet the application of these methods to spectroscopic and spectrometric data, referred to as Spectroscopy Machine Learning (SpectraML), remains relatively underexplored. Modern spectroscopic techniques (MS, NMR, IR, Raman, UV-Vis) generate an ever-growing volume of high-dimensional data, creating a pressing need for automated and intelligent analysis beyond traditional expert-based workflows. In this survey, we provide a unified review of SpectraML, systematically examining state-of-the-art approaches for both forward tasks (molecule-to-spectrum prediction) and inverse tasks (spectrum-to-molecule inference). We trace the historical evolution of ML in spectroscopy, from early pattern recognition to the latest foundation models capable of advanced reasoning, and offer a taxonomy of representative neural architectures, including graph-based and transformer-based methods. Addressing key challenges such as data quality, multimodal integration, and computational scalability, we highlight emerging directions such as synthetic data generation, large-scale pretraining, and few- or zero-shot learning. To foster reproducible research, we also release an open-source repository containing recent papers and their corresponding curated datasets (https://github.com/MINE-Lab-ND/SpectrumML_Survey_Papers). Our survey serves as a roadmap for researchers, guiding progress at the intersection of spectroscopy and AI.


Watching Pseudomonas mevalonii HMG-CoA Reductase in Action

January 2025

·

8 Reads

HMG CoA Reductase catalyzed the interconversion of a thioester, HMG CoA, and mevalonic acid, a key step in the isoprenoid pathway, through a complex reaction mechanism involving three distinct chemical steps with two molecules of cofactor and large-scale rearrangements of the enzyme. Here, we investigate the second step, the formation of a thiohemiacetal from CoA and mevaldehyde, using time resolved crystallography and molecular dynamics (MD) simulations. After triggering the reaction by a pH jump from pH 6.7 to pH9, the formation of the carbon-sulfur bond can be observed in the two structures at 2.5 and 4 minutes. The structures obtained close to the activated complex of the reaction serve as the starting point for MD simulations of different possible protonation states of the catalytically active residues. Changes to the active site geometry, specifically the residues Ser 85, Glu 83 and His 381 that are important for catalysis of the reaction are discussed in detail. This work demonstrates the applicability of the combination of time resolved crystallography using a pH trigger with D simulations to obtain a detailed view of a complex reaction in an enzyme active site.




Synthesis of Caged HMG-CoA Reductase Substrates for Elucidation of Cellular Pathways

October 2024

·

12 Reads

·

1 Citation

The synthesis of photocaged substrates of the biologically important enzyme HMG-CoA reductase is reported. HMG-CoA bearing a p-hydroxyphenacyl (pHP) photocage moiety was synthesized in an overall yield of 14% over seven steps in addition to caged forms of mevalonate and mevaldehyde. The absorption maximum and quantum yield for decaging of the photocaged compounds is pH dependent with a lambda (max) = 330 nm and a quantum yield of 5%, respectively, at pH 9.1 but lambda (max) = 290 nm and a quantum yield of 16% at pH 6.7.


Figure 3. m56Hc increases thermal stability of NPC1 protein.
Figure 4. Generation of humanized I1061T NPC1 mice. (A) Depiction of the editing strategy used to create the humanized I1061T NPC1 mouse. The human NPC1 cDNA sequence encoding the I1061T NPC1 mutation was inserted into exon 2 of mouse Npc1, downstream of the signal peptide sequence by CRISPR/Cas9 homology-directed repair. Restriction enzymes (BcII, BspHI) and Southern blot (5' and 3' probes) were used to determine proper insertion, and PCR (5' Universal Forward Primer, 3' Humanized Reverse Primer, 3' Mouse Reverse Primer) were designed for mouse genotyping.
Figure 6. Age-dependent neurodegeneration in humanized I1061T NPC1 mice. (A) Midline sagittal sections of cerebellar lobules V-VI were stained with H&E and imaged. Top, Npc1+/-; bottom, hI10/-. Scale bar = 50 μm. (B) Purkinje cell count and molecular layer thickness in cerebellar lobules V-VI were quantified in Npc1+/-and hI10/-mice at 16 and 52 weeks of age. (C) The cerebellum of 16-week-old mice were stained for GFAP (green). Nuclei stain by DAPI (blue). Scale bar = 100 μm. (D) Purkinje neurons in the cerebellum of 52-week-old mice were stained for calbindin (red) and LAMP1 (green). LAMP1 positive area quantified at right. Scale bar = 10 μm.
Figure 7. Liver pathology in humanized I1061T NPC1 mice. (A) Serum AST and ALT were measured in mice at 52 weeks. U/L, units/liter. (B) Liver from 52-week-old Npc1+/-and hI10/-mice were stained with H&E and imaged. Arrows highlight clusters of foamy macrophages. Scale bar = 50 μm. (C-G) Mass spectrometry lipidomic analysis of liver tissue from Npc1+/-and hI10/-at 52 weeks showed significant increases of (C) free cholesterol, (D) sphingomyelins (SM), (E)
Mutant induced neurons and humanized mice enable identification of Niemann-Pick C1 proteostatic therapies

August 2024

·

60 Reads

JCI Insight

Therapeutics that rescue folding, trafficking, and function of disease-causing missense mutants are sought for a host of human diseases, but efforts to leverage model systems to test emerging strategies have met with limited success. Such is the case for Niemann-Pick type C1 disease, a lysosomal disorder characterized by impaired intracellular cholesterol trafficking, progressive neurodegeneration, and early death. NPC1, a multipass transmembrane glycoprotein, is synthesized in the endoplasmic reticulum and traffics to late endosomes/lysosomes, but this process is often disrupted in disease. We sought to identify small molecules that promote folding and enable lysosomal localization and functional recovery of mutant NPC1. We leveraged a panel of isogenic human induced neurons expressing distinct NPC1 missense mutations. We used this panel to rescreen compounds that were reported previously to correct NPC1 folding and trafficking. We established mo56-hydroxycholesterol (mo56Hc) as a potent pharmacological chaperone for several NPC1 mutants. Furthermore, we generated mice expressing human I1061T NPC1, a common mutation in patients. We demonstrated that this model exhibited disease phenotypes and recapitulated the protein trafficking defects, lipid storage, and response to mo56Hc exhibited by human cells expressing I1061T NPC1. These tools established a paradigm for testing and validation of proteostatic therapeutics as an important step towards the development of disease-modifying therapies.


Large Language Model Based Multi-agents: A Survey of Progress and Challenges

August 2024

·

155 Reads

·

342 Citations

Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Due to their notable capabilities in planning and reasoning, LLMs have been utilized as autonomous agents for the automatic execution of various tasks. Recently, LLM-based agent systems have rapidly evolved from single-agent planning or decision-making to operating as multi-agent systems, enhancing their ability in complex problem-solving and world simulation. To offer an overview of this dynamic field, we present this survey to offer an in-depth discussion on the essential aspects and challenges of LLM-based multi-agent (LLM-MA) systems. Our objective is to provide readers with an in-depth understanding of these key points: the domains and settings where LLM-MA systems operate or simulate; the profiling and communication methods of these agents; and the means by which these agents develop their skills. For those interested in delving into this field, we also summarize the commonly used datasets or benchmarks. To keep researchers updated on the latest studies, we maintain an open-source GitHub repository (github.com/taichengguo/LLM_MultiAgents_Survey_Papers), dedicated to outlining the research of LLM-MA research.



MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

June 2024

·

85 Reads

Recently, Large Language Models (LLMs) with their strong task-handling capabilities have shown remarkable advancements across a spectrum of fields, moving beyond natural language understanding. However, their proficiency within the chemistry domain remains restricted, especially in solving professional molecule-related tasks. This challenge is attributed to their inherent limitations in comprehending molecules using only common textual representations, i.e., SMILES strings. In this study, we seek to enhance the ability of LLMs to comprehend molecules by designing and equipping them with a multi-modal external module, namely MolX. In particular, instead of directly using a SMILES string to represent a molecule, we utilize specific encoders to extract fine-grained features from both SMILES string and 2D molecular graph representations for feeding into an LLM. Moreover, a human-defined molecular fingerprint is incorporated to leverage its embedded domain knowledge. Then, to establish an alignment between MolX and the LLM's textual input space, the whole model in which the LLM is frozen, is pre-trained with a versatile strategy including a diverse set of tasks. Extensive experimental evaluations demonstrate that our proposed method only introduces a small number of trainable parameters while outperforming baselines on various downstream molecule-related tasks ranging from molecule-to-text translation to retrosynthesis, with and without fine-tuning the LLM.


Citations (58)


... 2024 [16] Survey of LLM-based MAS, including agent-environment interface, LLM agent characterization, inter-agent comm., capability acquisition, and applications. ...

Reference:

Internet of Agents: Fundamentals, Applications, and Challenges
Large Language Model Based Multi-agents: A Survey of Progress and Challenges
  • Citing Conference Paper
  • August 2024

... [2,3] Recent years have witnessed an increasing application of machine learning (ML) approaches in catalysis. [4][5][6][7] For instance, various ML methods have been developed for reactivity and selectivity prediction. [8][9][10][11] The use of ML models for catalyst design and reaction optimization has also shown great potential. ...

Data-Efficient, Chemistry-Aware Machine Learning Predictions of Diels-Alder Reaction Outcomes
  • Citing Article
  • June 2024

Journal of the American Chemical Society

... In drug discovery and chemical optimization, the accurate prediction of molecular properties plays a crucial role in guiding experimental design and compound selection [1][2][3]. Among these predictions, chemists often prioritize specific ranges of yield and potency -particularly extreme values -as these are most likely to become viable drug candidates [4][5][6][7][8]. This introduces a domain preference bias toward higher values, which are considered more relevant for practical applications. ...

Are we Making Much Progress? Revisiting Chemical Reaction Yield Prediction from an Imbalanced Regression Perspective
  • Citing Conference Paper
  • May 2024

... [11][12][13] The VP40 crystal structure demonstrated formation of a high affinity butterfly-shaped dimer with a distinct N-terminal domain (NTD) and C-terminal domain (CTD) connected by a flexible linker 5 . The NTD has a critical role in forming NTD-NTD interactions at the dimer interface 5,7,14,15 and formation of the octameric ring through a different NTD interface. 6,8 Meanwhile, the CTD is required for binding to host cell plasma membrane lipids and formation of dimer-dimer interactions. ...

Computational and experimental identification of keystone interactions in Ebola virus matrix protein VP40 dimer formation

... Upon light activation at a characteristic wavelength, the substrate will be released from the cage, subsequently diffusing into protein active sites. Photocages also allow perturbations such as pH jumps, a useful tool for investigating protonation states and observing intermediate states by halting reactions within pH-sensitive enzymes (Purohit et al., 2024). ...

pH-dependent reaction triggering in PmHMGR crystals for time-resolved crystallography
  • Citing Article
  • February 2024

Biophysical Journal

... Several review and perspective articles have been published on this topic. [12][13][14][15][16][17][18][19] Also, the closely related field of chemoselectivity is not discussed in detail and is only briefly mentioned where appropriate. 20 While the terms regioselectivity and site-selectivity are often used synonymously, they can serve to describe slightly different observations. ...

Interplay of Computation and Experiment in Enantioselective Catalysis: Rationalization, Prediction, and─Correction?
  • Citing Article
  • October 2023

ACS Catalysis

... [11][12][13] The VP40 crystal structure demonstrated formation of a high affinity butterfly-shaped dimer with a distinct N-terminal domain (NTD) and C-terminal domain (CTD) connected by a flexible linker 5 . The NTD has a critical role in forming NTD-NTD interactions at the dimer interface 5,7,14,15 and formation of the octameric ring through a different NTD interface. 6,8 Meanwhile, the CTD is required for binding to host cell plasma membrane lipids and formation of dimer-dimer interactions. ...

Elucidating Residue-Level Determinants Affecting Dimerization of Ebola Virus Matrix Protein Using High-Throughput Site Saturation Mutagenesis and Biophysical Approaches
  • Citing Article
  • July 2023

The Journal of Physical Chemistry B

... Small language models have considerably smaller computing requirements than LLMs and can often be run on consumer-grade hardware. Emerging evidence suggests that for specific tasks, as opposed to general purposes, small language models can compete with LLMs in terms of their performance (Guo et al., 2023;Luo et al., 2023;Bolton et al., 2024). ...

Graph-based Molecular Representation Learning

... For instance, in Bran et al. [23], researchers augmented LLMs by providing access to expert-designed tools for drug discovery, materials design and organic synthesis. The second category involves using LLMs directly for downstream tasks such as property prediction, reagent selection and molecule captioning [24,25,26,27]. In Guo et al. [24], they benchmarked LLMs in zero-and few-shot settings, demonstrating their capabilities in explaining, understanding and reasoning over chemistry. ...

What indeed can GPT models do in chemistry? A comprehensive benchmark on eight tasks

... This action is involved in the metabolism of clinical drugs, as well as cholesterol and fatty acid metabolism. Finally, Patel et al. have undertaken QM-cluster and QM/MM studies of thiohemiacetal intermediates with an enzyme whose active site is similar to thioesterases, demonstrating that hydrolysis with a water molecule is not possible in their case [8]. ...

Computational Study of Base-Catalyzed Thiohemiacetal Decomposition in Pseudomonas mevalonii HMG-CoA Reductase
  • Citing Article
  • May 2023

The Journal of Physical Chemistry B