Jonathan Warrell’s research while affiliated with NEC Laboratories America and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (63)


Quantum variational autoencoder utilizing regularized mixed-state latent representations
  • Article

April 2025

Physical Review A

Gaoyuan Wang

·

Jonathan Warrell

·

Prashant S. Emani

·

Mark Gerstein

Variational methods for Learning Multilevel Genetic Algorithms using the Kantorovich Monad

November 2024

·

3 Reads

Jonathan Warrell

·

Francesco Alesiani

·

Cameron Smith

·

[...]

·

Martin Renqiang Min

Levels of selection and multilevel evolutionary processes are essential concepts in evolutionary theory, and yet there is a lack of common mathematical models for these core ideas. Here, we propose a unified mathematical framework for formulating and optimizing multilevel evolutionary processes and genetic algorithms over arbitrarily many levels based on concepts from category theory and population genetics. We formulate a multilevel version of the Wright-Fisher process using this approach, and we show that this model can be analyzed to clarify key features of multilevel selection. Particularly, we derive an extended multilevel probabilistic version of Price's Equation via the Kantorovich Monad, and we use this to characterize regimes of parameter space within which selection acts antagonistically or cooperatively across levels. Finally, we show how our framework can provide a unified setting for learning genetic algorithms (GAs), and we show how we can use a Variational Optimization and a multi-level analogue of coalescent analysis to fit multilevel GAs to simulated data.


1230 Design of enhanced TCR against cancer antigens using an AI system
  • Conference Paper
  • Full-text available

November 2024

·

26 Reads

Background Naturally occurring TCR targeting cancer antigens are associated with relatively low affinity comparatively to TCR targeting external pathogens. This might be explained by the proximity of cancer specific sequences to self. Engineering of modified affinity enhanced TCR constitutes a possible solution, however, TCR binding remains challenging to model using structural biology approaches because of the conformational flexibility of the TCR complex. The use of machine learning based methods constitutes a promising approach to design TCR of higher affinity. Herein, we report enhanced affinity TCR sequences against cancer antigens designed using TCRPPO, a proprietary pipeline for TCR sequence optimization. Methods TCRPPO is a new reinforcement-learning framework based on proximal policy optimization to optimize TCRs through a mutation policy. Briefly after training the system on a series of TCR sequences known to bind a given target, TCRPPO introduces mutations on existing sequence to achieve higher affinity guided by a reward function factoring in affinity of the new sequence and the likelihood for this sequence to be a valid TCRs. To validate our approach, we designed a series of candidate TCR sequences against known clinically relevant cancer antigens (KRAS G12V and MART-1) and evaluated their biological functional potency. To do so, genes encoding variable regions of the original and optimized TCRα and β chains were assembled into plasmid vectors containing a constant region of a TCRα or TCRβ chain. TAP fragments of TCRα and TCRβ together with a NFAT-Luc reporter plasmid were transfected into the ΔTCR Jurkat cell line. The cells were cultured in the presence of antigen presenting cells with or without target peptide, and then the activation of the reporter gene was measured by luciferase assay. Results Our AI-based TCR engineering approach generated valid enhanced TCR sequences against the selected epitopes. Engineered TCR transfected cells showed higher activity in the functional assay and demonstrated that TCR generated using a mutation policy can achieve higher biological activity than endogenous TCR. Enhanced TCR generated against KRAS G12V and MART-1 are dissimilar from already described TCR.¹ Conclusions We successfully engineered TCRs to have better antigen recognition. The enhanced TCRs warrant further characterization to evaluate their therapeutic potential. Beyond this case, our approach constitutes a pipeline that might be applied to other targets for which alternative TCRs are required. Reference • Chen Z, Min MR, Guo H, Cheng C, Clancy T, Ning X. T-Cell receptor optimization with reinforcement learning and mutation polices for precision immunotherapy. In: Tang, H. (eds) Research in Computational Molecular Biology. RECOMB. 2023;Lecture Notes in Computer Science(), vol 13976. Springer, Cham. https://doi.org/10.1007/978-3-031-29119-7_11.

Download

Figure 2. Average validation and test accuracy during training on the regulator datasets For basic GCN and GTN models, we show a simple average over all samples. (A) Regulator PDB dataset: PCGCN with cp = 2, 0.1, 0.1 , best epoch t best = 59; PCGTN with cp = 1, 0.1, 0.1 , best epoch t best = 53. (B) Regulator AlphaFold dataset: PCGCN with cp = 8, 2.5, 0.1 , best epoch t best = 59; PCGTN with cp = 8, 3.5, 0.1 , best epoch t best = 58.
Figure 3. PCGTN performance on the scaffold AlphaFold dataset 2D plots showing p fac and C 1 dependency of the validation accuracy of the PCGTN at C 2 = 0.1; the optimal C 2 value is only screened using the most promising p fac and C 1 choices. (A) Average validation accuracy in the best-performing epoch. (B) Peak validation accuracy in the best-performing epoch. (C) Average validation accuracy and test accuracy of PCGTN against number of metaepochs for the best model cp = 8, 2.5, 0.2 and the second-best model cp = 7, 3.0, 0.1 . PCGTN models show clear improvement through the variational optimization and gradual outperformance with respect to the basic GTN. The best meta-epoch of cp = 8, 2.5, 0.2 in this example is found to be 55, and for cp = 7, 3.0, 0.1 , it is meta-epoch 40. MA val. shows the area between moving average +/− moving standard deviation with window size 5. (D) ROC curves for the peak performing model and the average over all models in the best meta-epoch.
Figure 5. Evaluation of the biological relevance of clusters identified by PCGTN by comparing them with protein disordered region annotations from UniProt (A) The difference between median overlap score from signal (LLPS) and background proteins. Zeros are in white and indicate no difference between medians of two group, positive values are in blue and indicate that the signal group has a larger median, and negative values are in red and indicate that the background group has a larger median. (B) Overlap score right-hand Mann-Whitney p value; darker colors indicate a smaller p value, which indicates that the observation (signal group has a larger overlap score than the background group) is more significant. (C) The difference between mean value of |1 -cluster size ratio| of signal and background proteins. Zeros are in white and indicate no difference between the two groups, positive values are in blue and indicate that background proteins have cluster size ratios closer to 1, and negative values are in red and indicate that signal proteins have cluster size ratios closer to 1. (D) For three of the best models from the performance test, the distribution of overlap scores for signal and background proteins is shown. See also Figure S3 and S4.
A variational graph-partitioning approach to modeling protein liquid-liquid phase separation

November 2024

·

5 Reads

Cell Reports Physical Science

Graph neural networks (GNNs) have emerged as powerful tools for representation learning. Their efficacy depends on their having an optimal underlying graph. In many cases, the most relevant information comes from specific subgraphs. In this work, we introduce a GNN-based framework (graph-partitioned GNN [GP-GNN]) to partition the GNN graph to focus on the most relevant subgraphs. Our approach jointly learns task-dependent graph partitions and node representations, making it particularly effective when critical features reside within initially unidentified subgraphs. Protein liquid-liquid phase separation (LLPS) is a problem especially well-suited to GP-GNNs because intrinsically disordered regions (IDRs) are known to function as protein subdomains in it, playing a key role in the phase separation process. In this study, we demonstrate how GP-GNN accurately predicts LLPS by partitioning protein graphs into task-relevant subgraphs consistent with known IDRs. Our model achieves state-of-the-art accuracy in predicting LLPS and offers biological insights valuable for downstream investigation.


Enhancing Immunotherapy Outcomes: Spatial Multi-Omics Models for Non-Small Cell Lung Cancer

September 2024

·

76 Reads

Non-small cell lung cancer (NSCLC) has an increasing number of targeted and systemic therapies where subsets of patients have long-term durable benefit. Fundamental to understanding responses to a given therapy is comprehensive molecular characterization of the underlying tumor immune microenvironment (TIME). The TIME may inform models to predict immunotherapy outcomes and features to delineate therapeutic responses and clinical endpoints. We hypothesize that an integrated multi-omics approach will uncover interactions within the NSCLC TIME and identify novel biomarkers that are predictive of immunotherapy responses, thus aiding precision oncology. To develop a spatially resolved TIME model for NSCLC immunotherapy using unbiased spatial proteomic and whole transcriptome profiling. We utilized a multi-omics approach, combining spatial mapping of protein expression at the single-cell resolution by Phenocycler Fusion (PCF) and multi-cellular readout whole transcriptome profiling at cellular compartment resolution by Digital Spatial Profiling (DSP-GeoMx-WTA).) This approach facilitated a detailed examination of the TIME in NSCLC samples from patients undergoing first-line immunotherapy. We studied two independent cohorts of advanced NSCLC tissue samples, treated with PD-1-based immunotherapies. We derived gene signatures from cell type signatures to predict treatment outcomes using a Least Absolute Shrinkage and Selection Operator (LASSO) model. Our spatial proteomic analysis identified three distinct cell types, proliferating tumor cells, granulocytes, and vessels, associated with resistance to immunotherapy. A high proportion of these cell types demonstrated a hazard ratio (HR) of 3.8 (p = 0.004) in the training cohort and 1.8 (p = 0.05) in the validation cohort. In the response cell-type model, higher levels of M1 macrophages, M2 macrophages, and CD4 T cells had an HR of 0.4 (p = 0.019) in the training cohort and 0.49 (p = 0.036) in the validation. In the transcriptomic analysis, gene signatures derived from these cell types predicted outcomes with high accuracy. The resistance gene model, which included 8 genes associated with epithelial-mesenchymal transition (EMT) and cell migration, showed an HR of 5.3 (p < 0.001) in the training cohort and 2.2 (p = 0.036) in the validation cohort. Conversely, the response gene model, consisting of 8 genes associated with immunomodulation, had an HR of 0.22 (p = 0.005) in the training cohort and 0.38 (p = 0.034) in the validation cohort. This research highlights the potential of a multi-omics cell typing and gene expression profiling approach in advancing NSCLC treatment toward precision oncology. By offering insights into the TIME and unveiling novel biomarkers, our model seeks to define resistance and to improve the prediction of response to treatment.


Predicting spatially resolved gene expression via tissue morphology using adaptive spatial GNNs

September 2024

·

5 Reads

·

2 Citations

Bioinformatics

Motivation Spatial transcriptomics technologies, which generate a spatial map of gene activity, can deepen the understanding of tissue architecture and its molecular underpinnings in health and disease. However, the high cost makes these technologies difficult to use in practice. Histological images co-registered with targeted tissues are more affordable and routinely generated in many research and clinical studies. Hence, predicting spatial gene expression from the morphological clues embedded in tissue histological images provides a scalable alternative approach to decoding tissue complexity. Results Here, we present a graph neural network based framework to predict the spatial expression of highly expressed genes from tissue histological images. Extensive experiments on two separate breast cancer data cohorts demonstrate that our method improves the prediction performance compared to the state-of-the-art, and that our model can be used to better delineate spatial domains of biological interest. Availability and implementation https://github.com/song0309/asGNN/


Spatially Informed Gene Signatures for Response to Immunotherapy in Melanoma

June 2024

·

65 Reads

·

4 Citations

Clinical Cancer Research

Purpose We aim to improve the prediction of response or resistance to immunotherapies in patients with melanoma. This goal is based on the hypothesis that current gene signatures predicting immunotherapy outcomes show only modest accuracy due to the lack of spatial information about cellular functions and molecular processes within tumors and their microenvironment. Experimental Design We collected gene expression data spatially from three cellular compartments defined by CD68+ macrophages, CD45+ leukocytes, and S100B+ tumor cells in 55 immunotherapy-treated melanoma specimens using Digital Spatial Profiling–Whole Transcriptome Atlas. We developed a computational pipeline to discover compartment-specific gene signatures and determine if adding spatial information can improve patient stratification. Results We achieved robust performance of compartment-specific signatures in predicting the outcome of immune checkpoint inhibitors in the discovery cohort. Of the three signatures, the S100B signature showed the best performance in the validation cohort (N = 45). We also compared our compartment-specific signatures with published bulk signatures and found the S100B tumor spatial signature outperformed previous signatures. Within the eight-gene S100B signature, five genes (PSMB8, TAX1BP3, NOTCH3, LCP2, and NQO1) with positive coefficients predict the response, and three genes (KMT2C, OVCA2, and MGRN1) with negative coefficients predict the resistance to treatment. Conclusions We conclude that the spatially defined compartment signatures utilize tumor and tumor microenvironment–specific information, leading to more accurate prediction of treatment outcome, and thus merit prospective clinical assessment.


Predicting Spatially Resolved Gene Expression via Tissue Morphology using Adaptive Spatial GNNs

June 2024

·

6 Reads

Motivation: Spatial transcriptomics technologies, which generate a spatial map of gene activity, can deepen the understanding of tissue architecture and its molecular underpinnings in health and disease. However, the high cost makes these technologies difficult to use in practice. Histological images co-registered with targeted tissues are more affordable and routinely generated in many research and clinical studies. Hence, predicting spatial gene expression from the morphological clues embedded in tissue histological images, provides a scalable alternative approach to decoding tissue complexity. Results: Here, we present a graph neural network based framework to predict the spatial expression of highly expressed genes from tissue histological images. Extensive experiments on two separate breast cancer data cohorts demonstrate that our method improves the prediction performance compared to the state-of-the-art, and that our model can be used to better delineate spatial domains of biological interest. Availability: https://github.com/song0309/asGNN/


Single-cell genomics and regulatory networks for 388 human brains

May 2024

·

332 Reads

·

43 Citations

Science

Single-cell genomics is a powerful tool for studying heterogeneous tissues such as the brain. Yet little is understood about how genetic variants influence cell-level gene expression. Addressing this, we uniformly processed single-nuclei, multiomics datasets into a resource comprising >2.8 million nuclei from the prefrontal cortex across 388 individuals. For 28 cell types, we assessed population-level variation in expression and chromatin across gene families and drug targets. We identified >550,000 cell type–specific regulatory elements and >1.4 million single-cell expression quantitative trait loci, which we used to build cell-type regulatory and cell-to-cell communication networks. These networks manifest cellular changes in aging and neuropsychiatric disorders. We further constructed an integrative model accurately imputing single-cell expression and simulating perturbations; the model prioritized ~250 disease-risk genes and drug targets with associated cell types.


Abstract 1141: Spatial-specific gene signatures to predict immunotherapy outcomes in lung cancer

March 2024

·

12 Reads

Cancer Research

The widespread use of immunotherapy in lung cancer, and its more recent approval for early stages, underscores the need for biomarkers that can identify the most responsive patients. Spatial transcriptomics, which maps gene expression in its spatial tissue context, offers a unique approach over bulk transcriptomics. By incorporating spatial information, the predictive power of the signature can be enhanced. Here, we aim to develop spatially informed gene signatures that could be translated into clinical RNA in situ assays, distinguishing patients unlikely to benefit from immunotherapy and thereby sparing them unnecessary side effects. We utilized the NanoString GeoMX Whole Transcriptome Atlas for spatially resolved transcriptomic profiling of retrospectively collected lung cancer tissue samples from patients treated with immunotherapy in an advanced-stage setting (N=60). By targeting 18,190 genes within distinct areas of interest (AOIs)—including stromal (macrophages/CD68+ and leukocytes/CD45+) and tumor (cytokeratin, CK+) cells—we developed AOI-specific gene signatures to predict objective responses. These were derived from a robust computational framework employing LASSO logistic regression on a split-sample approach, yielding predictive models for treatment outcome.We achieved high predictive accuracy on the training set, with the area under the curve (AUC) exceeding 0.86 for all AOI-specific spatial signatures, indicating strong potential for clinical application. Validation against an independent cohort (N=42) corroborated the efficacy of these signatures. Our 6-gene tumor signature was validated with an AUC of 0.73 (95% CI: 0.67-0.89, p = 0.009**), while the CD45 5-gene signature showed an AUC of 0.75 (95% CI: 0.53-0.97, p = 0.022*). Our 18-gene CD68 signature trended towards validation but lacked statistical significance. A combined CD68 and CD45 signature predicted outcomes with greater accuracy, achieving an AUC of 0.79 (95% CI: 0.62-0.98, p = 0.0088*). Following gene set enrichment analysis on our differentially expressed and signature genes, we identified genes with positive coefficients in both tumor and stroma signatures that were associated with glucocorticoid response and glycolytic processes linked to T-cell homeostasis, while genes with negative coefficients were associated with epithelial cell differentiation and cytokine production. These associations are concordant with the observation that genes with positive coefficients are predictors of treatment response, while those with negative coefficients indicate resistance.Our findings indicate that AOI-specific signatures predict the immunotherapy outcome in lung cancer with high accuracy, suggesting that spatial assessment can provide substantial predictive information. The high performance of these signatures indicates their potential for prospective clinical applications. Citation Format: Thazin N. Aung, Myrto Moutafi, Ioannis Trontzas, Arutha Kulasinghe, James Monkman, Niki Gavrielatou, Ioannis Vathiotis, Jonathan H. Warrell, David L. Rimm. Spatial-specific gene signatures to predict immunotherapy outcomes in lung cancer [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 1141.


Citations (36)


... Recent reports in lung and colon cancers have shown that genotypical profiles (e.g., mutations, 97,98 gene expressions, 99 molecular subtypes 100 ) of tumors can be predicted solely by examining routine inexpensive digitized H&E slides. These algorithms work by identifying subtle patterns that are challenging for reproducible human quantification, however, this technology is still in its early stages and not yet capable of reliably replacing traditional molecular testing.101 Although these technologies have limited/no track record when applied to MPTN, it stands to reason that employing this approach could equally detect minute differences among tumors which equate with non-clonality, implying tumors are unrelated and therefore not IPM. ...

Reference:

Differentiating separate primary lung adenocarcinomas from intrapulmonary metastases with emphasis on pathological and molecular considerations: Recommendations from the IASLC Pathology Committee
Predicting spatially resolved gene expression via tissue morphology using adaptive spatial GNNs
  • Citing Article
  • September 2024

Bioinformatics

... Thus, MGRN1 appeared important for genomic stability and its downregulation was sufficient to trigger a cell-autonomous transcriptional program strongly influencing the immunologic behavior of MM cells. Interestingly, a recent study identified MGRN1 in an 8-gene signature predicting the response to immunotherapy in MM, further supporting its role in modulating the immune microenvironment of the tumor [70]. Also of note, the GSEA of the SKCM cohort indicated a similar transcriptional landscape in tumors with low expression of MGRN1, suggesting comparable transcriptional effects of MGRN1 in cultured MM cells and the tumors. ...

Spatially Informed Gene Signatures for Response to Immunotherapy in Melanoma
  • Citing Article
  • June 2024

Clinical Cancer Research

... To annotate the cell-specific functions of candidate elements, we integrated single-cell ATAC-seq data from three studies 17,27,28 , encompassing chromatin accessibility profiles for 8 brain cell types, in our deep learning model. To this end, our model profiles a set of predicted enhancers and silencers for nine cellular contexts: the DLPFC and eight distinct brain cell types. ...

Single-cell genomics and regulatory networks for 388 human brains
  • Citing Article
  • May 2024

Science

... A phylogeny of Gabon folk music patrimonies has been proposed [20], which highlighted the role of vertical transmission, compared to the focus on lateral transmission in evolutionary linguistics [6]. Bioinformatics tools have been used to characterize diversity in Japanese vs. English and US folk melodies directly [21], or electronic music [22]; most recently, Billboard songs have been studied using methods from evolutionary cancer genomics reformulated as a variational autoencoder [23]. We also must mention the Cantometrics project [24], which proposed a high-level evolutionary tree of folk music worldwide. ...

Latent evolutionary signatures: a general framework for analysing music and cultural evolution

... To overcome the aforementioned issues, we propose an adaptive spatial Graph Neural Network (asGNN) for spatial gene expression prediction, which builds on the smoothing based GNN (SBGNN) framework of [3]. The SBGNN framework was developed to predict liquid-liquid phase separation from 3D molecular graphs, by using graph structure to adaptively refine molecular graphs to remove task irrelevant edges to help perform graph classification. ...

A Variational Graph Partitioning Approach to Modeling Protein Liquid-liquid Phase Separation

... We ran the SIGNAL webtool (https://signal.mutationalsignatures.com/) (Degasperi et al. 2020) under default parameters to identify the single base substitution (SBS) signatures active in each patient (Alexandrov et al. 2020). As recommended by the authors, SBS fitting was performed using candidate SBS signatures from CRC. ...

Author Correction: The repertoire of mutational signatures in human cancer

Nature

... In clinical practice, differentiating between multiple primary lung cancers and intrapulmonary metastasis is challenging using pathology or morphologic appearance alone, and NGS is now recommended to help definitively differentiate these entities [5,[8][9][10][11][12][13][14][15]. When using genomic results to define multiple primaries versus intrapulmonary metastasis, we observed significantly improved survival in those with genome-defined multiple primaries versus intrapulmonary metastasis. ...

Author Correction: The evolutionary history of 2,658 cancers

Nature

... Characterizing the full spectrum of SVs in human genomes makes it possible to understand the distribution of SVs in different populations [8]. Additionally, SVs have been found to play a critical role in the development of certain diseases, such as cancer [22] and Alzheimer's [36]. SVs are typically referred to any genome sequence altering event, except for deletions or insertions (indels) shorter than 50 bp [32]. ...

Author Correction: Patterns of somatic structural variation in human cancer genomes

Nature

... RPL39L is a recently evolved ( 1 ) and non-redundant paralog of RPL39L that has just been implicated in the translation of long-lived, sperm cell-specific proteins ( 11 ). However, the RPL39L mRNA was also observed outside of the germ cell lineage, particularly in ovarian ( 12 ) and breast cancer tissues ( 2 ), as well as in lung cancer ( 13 ) and neuroblastoma ( 11 ) cell lines, where the expression appears to be driven by gene amplifications ( 14 ) and CpG island hypomethylation ( 13 ). These observations suggest that RPL39L's function extends beyond the translation of long-lived sperm cell proteins. ...

Author Correction: Genomic basis for RNA alterations in cancer

Nature

... Genomic alterations have been ranked based on their recurrence and on their functional consequences, finally developing a clustering methodology to discriminate between potential driver events [46]. The extension of the sequencing to intergenic regions allowed evaluation of the burden of putative driver mutations in noncoding regions: on the pan-cancer database, 13% of all mutations were represented by driver point-mutation events in an intergenic region, with 25% of all PCAWG cancers analysed bearing at least one, one-third of which occurred in the TERT promoter, confirming its role in cancer [73][74][75][76][77][78][79]. On the counterpart, 91% of all cancers harboured a somatic driver event in a coding region of a gene (Fig. 3). ...

Author Correction: High-coverage whole-genome analysis of 1220 cancers reveals hundreds of genes deregulated by rearrangement-mediated cis-regulatory alterations