October 2024
·
3 Reads
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
October 2024
·
3 Reads
August 2024
·
36 Reads
·
3 Citations
European Journal of Cancer
July 2024
·
18 Reads
The successful application of machine learning (ML) in catalyst design relies on high-quality and diverse data to ensure effective generalization to novel compositions, thereby aiding in catalyst discovery. However, due to complex interactions, catalyst design has long relied on trial-and-error, a costly and labor-intensive process leading to scarce data that is heavily biased towards undesired, low-yield catalysts. Despite the rise of ML in this field, most efforts have not focused on dealing with the challenges presented by such experimental data. To address these challenges, we introduce a robust machine learning and explainable AI (XAI) framework to accurately classify the catalytic yield of various compositions and identify the contributions of individual components. This framework combines a series of ML practices designed to handle the scarcity and imbalance of catalyst data. We apply the framework to classify the yield of various catalyst compositions in oxidative methane coupling, and use it to evaluate the performance of a range of ML models: tree-based models, logistic regression, support vector machines, and neural networks. These experiments demonstrate that the methods used in our framework lead to a significant improvement in the performance of all but one of the evaluated models. Additionally, the decision-making process of each ML model is analyzed by identifying the most important features for predicting catalyst performance using XAI methods. Our analysis found that XAI methods, providing class-aware explanations, such as Layer-wise Relevance Propagation, identified key components that contribute specifically to high-yield catalysts. These findings align with chemical intuition and existing literature, reinforcing their validity. We believe that such insights can assist chemists in the development and identification of novel catalysts with superior performance.
December 2023
·
39 Reads
·
1 Citation
October 2023
·
184 Reads
Despite advances in precision oncology, clinical decision-making still relies on limited parameters and expert knowledge. To address this limitation, we combined multimodal real-world data and explainable artificial intelligence (xAI) to introduce novel AI-derived (AID) markers for clinical decision support. We used deep learning to model the outcome of 15,726 patients across 38 solid cancer entities based on 350 markers, including clinical records, image-derived body compositions, and mutational tumor profiles. xAI determined the prognostic contribution of each clinical marker at the patient level and identified 114 key markers that accounted for 90% of the neural network's decision process. Moreover, xAI enabled us to uncover 1,373 prognostic interactions between markers. Our approach was validated in an independent cohort of 3,288 lung cancer patients from a US nationwide electronic health record-derived database. These results show the potential of xAI to transform the assessment of clinical parameters and enable personalized, data-driven cancer care.
July 2023
·
53 Reads
Model overconfidence and poor calibration are common in machine learning and difficult to account for when applying standard empirical risk minimization. In this work, we propose a novel method to alleviate these problems that we call odd-k-out learning (OKO), which minimizes the cross-entropy error for sets rather than for single examples. This naturally allows the model to capture correlations across data examples and achieves both better accuracy and calibration, especially in limited training data and class-imbalanced regimes. Perhaps surprisingly, OKO often yields better calibration even when training with hard labels and dropping any additional calibration parameter tuning, such as temperature scaling. We provide theoretical justification, establishing that OKO naturally yields better calibration, and provide extensive experimental analyses that corroborate our theoretical findings. We emphasize that OKO is a general framework that can be easily adapted to many settings and the trained model can be applied to single examples at inference time, without introducing significant run-time overhead or architecture changes.
May 2023
·
168 Reads
·
10 Citations
Single-pulse electrical stimulation in the nervous system, often called cortico-cortical evoked potential (CCEP) measurement, is an important technique to understand how brain regions interact with one another. Voltages are measured from implanted electrodes in one brain area while stimulating another with brief current impulses separated by several seconds. Historically, researchers have tried to understand the significance of evoked voltage polyphasic deflections by visual inspection, but no general-purpose tool has emerged to understand their shapes or describe them mathematically. We describe and illustrate a new technique to parameterize brain stimulation data, where voltage response traces are projected into one another using a semi-normalized dot product. The length of timepoints from stimulation included in the dot product is varied to obtain a temporal profile of structural significance, and the peak of the profile uniquely identifies the duration of the response. Using linear kernel PCA, a canonical response shape is obtained over this duration, and then single-trial traces are parameterized as a projection of this canonical shape with a residual term. Such parameterization allows for dissimilar trace shapes from different brain areas to be directly compared by quantifying cross-projection magnitudes, response duration, canonical shape projection amplitudes, signal-to-noise ratios, explained variance, and statistical significance. Artifactual trials are automatically identified by outliers in sub-distributions of cross-projection magnitude, and rejected. This technique, which we call “Canonical Response Parameterization” (CRP) dramatically simplifies the study of CCEP shapes, and may also be applied in a wide range of other settings involving event-triggered data.
January 2023
·
237 Reads
·
97 Citations
Science Advances
Global machine learning force fields, with the capacity to capture collective interactions in molecular systems, now scale up to a few dozen atoms due to considerable growth of model complexity with system size. For larger molecules, locality assumptions are introduced, with the consequence that nonlocal interactions are not described. Here, we develop an exact iterative approach to train global symmetric gradient domain machine learning (sGDML) force fields (FFs) for several hundred atoms, without resorting to any potentially uncontrolled approximations. All atomic degrees of freedom remain correlated in the global sGDML FF, allowing the accurate description of complex molecules and materials that present phenomena with far-reaching characteristic correlation lengths. We assess the accuracy and efficiency of sGDML on a newly developed MD22 benchmark dataset containing molecules from 42 to 370 atoms. The robustness of our approach is demonstrated in nanosecond path-integral molecular dynamics simulations for supramolecular complexes in the MD22 dataset.
January 2023
·
144 Reads
·
34 Citations
Nucleic Acids Research
The molecular heterogeneity of cancer cells contributes to the often partial response to targeted therapies and relapse of disease due to the escape of resistant cell populations. While single-cell sequencing has started to improve our understanding of this heterogeneity, it offers a mostly descriptive view on cellular types and states. To obtain more functional insights, we propose scGeneRAI, an explainable deep learning approach that uses layer-wise relevance propagation (LRP) to infer gene regulatory networks from static single-cell RNA sequencing data for individual cells. We benchmark our method with synthetic data and apply it to single-cell RNA sequencing data of a cohort of human lung cancers. From the predicted single-cell networks our approach reveals characteristic network patterns for tumor cells and normal epithelial cells and identifies subnetworks that are observed only in (subgroups of) tumor cells of certain patients. While current state-of-the-art methods are limited by their ability to only predict average networks for cell populations, our approach facilitates the reconstruction of networks down to the level of single cells which can be utilized to characterize the heterogeneity of gene regulation within and across tumors.
August 2022
·
30 Reads
·
1 Citation
Wiener klinisches Magazin
Given the rapid developments, there is no doubt that artificial intelligence (AI) will substantially impact pathological diagnostics. However, it remains an open question if AI will primarily be another diagnostic tool, such as immunohistochemistry, or if AI will also be able to replace human expertise. Most current studies on AI in histopathology deal with relatively simple diagnostic problems and are not yet capable of coping with the complexity of routine diagnostics. While some methods in molecular pathology would already be unthinkable without AI, it remains to be shown how AI will also be able to help with difficult histomorphological differential diagnoses in the future.
... Importantly, multiple instance learning allows ABMIL models to learn from specimen-level labels, not requiring exhaustive pixel-level annotations, which are time-consuming and costly to obtain 15 . This feature makes ABMIL models particularly well-suited for tasks such as cancer detection 16,17 , diagnosis [18][19][20][21] , identification of primary cancer origin 22 , grading 17,23,24 , genomic aberration detection [25][26][27][28] , molecular phenotyping [29][30][31] , treatment response prediction [32][33][34] , and prognostication 33, 35-37 . However, the widespread adoption of ABMIL models in clinical settings is hindered by challenges in model interpretability and trustworthiness 9,10,38,39 . ...
August 2024
European Journal of Cancer
... 9 Since then, there has been exponential progress for the current application of neuromodulation, including examining effective connectivity of brain networks, 10,11 identifying epileptogenic areas and functional connectivity, [12][13][14] , brain mapping, 15,16 , cortical connectivity, 9 and for numerous other research applications. [17][18][19][20] Stimulation of intracortical microelectrodes has been used more frequently to characterize single cell responses to electrical stimulation in primate 21 and human cortices. [22][23][24][25] Despite such long and comprehensive study of local field potential (LFP) responses to SPES (i.e., cortico-cortical evoked potentials (CCEPs)), only one previous study examined single neuron firing rate responses to SPES, likening them to patterns of activity induced by interictal epileptiform activity. ...
May 2023
... MD22 [88] includes four classes of biomolecules and supramolecules, ranging from a small peptide with 42 atoms to a double-walled nanotube with 370 atoms. Geometries were extracted from MD trajectories at 400 − 500 K with a resolution of 1 fs, with potential energies and atomic forces calculated at the PBE+MBD level of theory. ...
Reference:
OpenQDC: Open Quantum Data Commons
January 2023
Science Advances
... Still, using these datasets presents several challenges because of limited patient numbers, batch effects, and sparse data (22). Consequently, inferring networks based on these datasets has proven to be particularly difficult (23,24,25). ...
January 2023
Nucleic Acids Research
... Adapting the idea from computer vision [Montavon et al., 2019], these methods try to map the prediction score to the input space. Schnake et al. [2021] proposed an approach named GNN-LRP that extends Layer-wise Relevance Propagation (LRP) originally developed for explaining (convolutional neural networks) CNNs to graph-structured data. GNN-LRP works by propagating relevance backward through GNN layers, redistributing the model's output across walks within the graph structure. ...
September 2021
IEEE Transactions on Pattern Analysis and Machine Intelligence
... 9 Since then, there has been exponential progress for the current application of neuromodulation, including examining effective connectivity of brain networks, 10,11 identifying epileptogenic areas and functional connectivity, [12][13][14] , brain mapping, 15,16 , cortical connectivity, 9 and for numerous other research applications. [17][18][19][20] Stimulation of intracortical microelectrodes has been used more frequently to characterize single cell responses to electrical stimulation in primate 21 and human cortices. [22][23][24][25] Despite such long and comprehensive study of local field potential (LFP) responses to SPES (i.e., cortico-cortical evoked potentials (CCEPs)), only one previous study examined single neuron firing rate responses to SPES, likening them to patterns of activity induced by interictal epileptiform activity. ...
September 2021
... Electric fields may significantly influence catalytic activity, altering reaction pathways and energetics by modulating charge distributions and bonding character 258,259 . Several distinct methods to incorporate electric fields in MLP-driven MD simulations have been reported [260][261][262][263][264][265] . So far, applications of these methods have been limited to relatively simple systems such as liquid water 263,265 and molecules in vacuum or solution [260][261][262]264 . ...
July 2021
... It uses multiple criteria and multi-harmony memories to discover a set of candidate high-order SNP combinations associated with disease status. DeepCOMBI [29] utilizes CNNs within a deep-learning framework to predict phenotypes from SNPs in the context of GWAS. This innovative method not only achieves superior accuracy in phenotype prediction but also enhances the identification of genetic markers associated with complex traits, all without requiring genotype imputation. ...
June 2021
NAR Genomics and Bioinformatics
... 1) Domain Adaptive Graph Anomaly Detection: We adopt the Hypersphere Classification (HSC) Loss [33], which is tailored for anomaly detection in scenarios with scarce anomaly labels. The core idea of this loss function is to cluster normal samples around a central point while ensuring that anomalous samples are kept at a distance. ...
July 2021
... Sparse Bayesian Learning (SBL) pioneered this approach by optimizing a Type-II ML cost function to enhance source estimation in noisy environments [24]. Building on SBL, hierarchical Bayesian inference techniques have extended this framework to simultaneously estimate both sources and noise addressing various noise structures, including homoscedastic [25], heteroscedastic [26], and full-structure [27] noise. These methods have been further extended to include coherent spatial source clusters as in cMEM [28]. ...
June 2021
NeuroImage