Dongjoo Lee’s research while affiliated with Seoul National University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (21)


Abstract B038: A self-supervised AI model leveraging spatial omics for analyzing tumor microenvironment heterogeneity in breast cancer only with H&E
  • Article

March 2025

·

1 Read

Cancer Research

Haenara Shin

·

Dongjoo Lee

·

Yooeun Kim

·

[...]

·

Hongyoon Choi

Background: Digital high-resolution H&E images provide valuable insights into tumor heterogeneity within the tumor microenvironment (TME). However, the ability to perform detailed cell typing solely based on H&E images remains limited. Recent advances in high-resolution spatial transcriptomics (ST) enable precise characterization of cell types and their spatial relationships within the TME, addressing challenges in understanding tumor heterogeneity. In this study, we developed AI models to predict cell types in breast cancer—including subtypes of lymphocytes that are challenging to differentiate visually—using large-scale image-based ST data aligned with H&E images. Methods: We established a breast cancer image-based ST database comprising 190 samples from 113 breast cancer patients, obtained from surgically resected primary tumors. Image-based ST data were generated using the Xenium platform with a 500-gene panel and matched with high-resolution H&E images. Cell type maps were constructed using reference single-cell RNA-seq data and transferred to ST data. The ST data were spatially registered to the corresponding H&E images, and cell type masks were generated at matching resolutions. AI models were trained to segment cell type masks, including epithelial cells, cancer cells, myeloid cells, fibroblasts, endothelial cells, T cells, and B cells. Additionally, refined cell type predictions were performed for dendritic cells, NK cells, CD4+ T cells, and CD8+ T cells. Model performance was validated using external whole-slide image datasets containing ST data from four independent cohorts. Results: Model performance was assessed using the area under the receiver operating characteristic (AUROC) curve for each cell type mask. Internal validation on four samples yielded AUROC values ranging from 0.90 to 0.95. External validation across independent whole-slide datasets demonstrated AUROC values between 0.88 and 0.97 for all cell type masks. The AI model successfully mapped TME cell types with high resolution, using only H&E images. Conclusion: By employing a self-supervised approach that integrates high-resolution H&E images with image-based ST data, we developed AI-driven tools for TME analysis. These models enable accurate identification of detailed cell types and their spatial relationships within the TME. This approach facilitates large-scale analysis of breast cancer TME and holds potential for advancing our understanding of tumor biology and therapeutic strategies. Citation Format: Haenara Shin, Dongjoo Lee, Yooeun Kim, Daeseung Lee, Kwon Joong Na, Chihwan David Cha, Hosub Park, Hongyoon Choi. A self-supervised AI model leveraging spatial omics for analyzing tumor microenvironment heterogeneity in breast cancer only with H&E [abstract]. In: Proceedings of the AACR Special Conference in Cancer Research: Functional and Genomic Precision Medicine in Cancer: Different Perspectives, Common Goals; 2025 Mar 11-13; Boston, MA. Philadelphia (PA): AACR; Cancer Res 2025;85(5 Suppl):Abstract nr B038.


Workflow of the Chatbot System for Querying PET Imaging Reading Reports. The overall workflow of the proof-of-concept system designed for efficient querying of reading reports from a substantial dataset is illustrated. The system integrates the Retrieval-Augmented Generation (RAG) model with advanced language model technologies, natural language processing, and information retrieval techniques. The workflow demonstrates the process from user query input through to the delivery of the relevant reading report, showcasing the operational framework and interaction with different sources of reading reports
Visualization of PET Imaging Report Embeddings Using t-SNE. (A) t-SNE plot illustrates PET imaging report embeddings from 118,107 patients, totaling 211,813 cases. Each point on the plot represents a unique report, with a selected case highlighted in red to show an example of an original report. (B) t-SNE plots showcases the clustering efficacy of the embeddings, highlighting how reports containing key diagnostic terms like ‘lung cancer’, ‘breast cancer’, ‘lymphoma’, and specific types of exams such as ‘C-11 methionine PET’ and ‘Ga-68 PSMA-11 PET’ form distinct clusters. These clusters indicate the embeddings’ capability to reflect the similarity among cases, demonstrating the potential of this method in facilitating the identification and visualization of related PET imaging reports
Examples of Chatbot Responses to Queries. (A) An example case displays an instance of the chatbot’s capability to accurately identify and present relevant cases in response to a user query about breast cancer with metastasis to internal mammary lymph nodes. It highlights the capacity to navigate a vast database of previous reading reports to identify relevant cases. (B) An example of the utility of system in generating differential diagnoses is displayed. This is demonstrated through the chatbot’s response to a query, where it offers a detailed list of potential diagnoses along with reference identifiers. As an example, by employing identifiers within the PACS system (in this example, we used deidentified information), prior imaging cases could be referenced for understanding cases and supporting decision making
Evaluation of Appropriateness Scores by Nuclear Medicine Physicians. (A) The appropriateness of querying similar cases was assessed. Using a conclusion text to generate the prompt “find similar reports and summarize it,” the system retrieved results. For specific reports, 16 out of 19 (84.2%) were appropriately identified, with all three readers rating these as better than ‘Fair’ in relevance. (B) The appropriateness of potential diagnoses for specific findings was evaluated. Using specific finding texts to generate prompts for suggesting potential diagnoses, the responses of system were assessed. Medical relevance and appropriateness of the suggested potential diagnoses were evaluated by readers. The system without RAG was also assessed, and the performance of the LLM with and without RAG was represented as a heatmap. The results indicated that the LLM with RAG showed significantly better appropriateness scores (p < 0.05). (C) The ROUGE-L F-score was used to quantitatively evaluate the alignment between generated conclusions and reference conclusion texts from finding descriptions. The RAG framework demonstrated significantly higher scores compared to the LLM without RAG (0.16 ± 0.08 vs. 0.07 ± 0.03, p < 0.001)
Empowering PET imaging reporting with retrieval-augmented large language models and reading reports database: a pilot single center study
  • Article
  • Full-text available

January 2025

·

28 Reads

·

1 Citation

European Journal of Nuclear Medicine and Molecular Imaging

Purpose The potential of Large Language Models (LLMs) in enhancing a variety of natural language tasks in clinical fields includes medical imaging reporting. This pilot study examines the efficacy of a retrieval-augmented generation (RAG) LLM system considering zero-shot learning capability of LLMs, integrated with a comprehensive database of PET reading reports, in improving reference to prior reports and decision making. Methods We developed a custom LLM framework with retrieval capabilities, leveraging a database of over 10 years of PET imaging reports from a single center. The system uses vector space embedding to facilitate similarity-based retrieval. Queries prompt the system to generate context-based answers and identify similar cases or differential diagnoses. From routine clinical PET readings, experienced nuclear medicine physicians evaluated the performance of system in terms of the relevance of queried similar cases and the appropriateness score of suggested potential diagnoses. Results The system efficiently organized embedded vectors from PET reports, showing that imaging reports were accurately clustered within the embedded vector space according to the diagnosis or PET study type. Based on this system, a proof-of-concept chatbot was developed and showed the framework’s potential in referencing reports of previous similar cases and identifying exemplary cases for various purposes. From routine clinical PET readings, 84.1% of the cases retrieved relevant similar cases, as agreed upon by all three readers. Using the RAG system, the appropriateness score of the suggested potential diagnoses was significantly better than that of the LLM without RAG. Additionally, it demonstrated the capability to offer differential diagnoses, leveraging the vast database to enhance the completeness and precision of generated reports. Conclusion The integration of RAG LLM with a large database of PET imaging reports suggests the potential to support clinical practice of nuclear medicine imaging reading by various tasks of AI including finding similar cases and deriving potential diagnoses from them. This study underscores the potential of advanced AI tools in transforming medical imaging reporting practices.

Download

Figure 1. The design of SuperST (A) The schematic representation of SuperST. Here, a, b, c, d, e, and f represent input_1: an input H&E image, a conceptual down-sampling unit, concatenate, a conceptual up-sampling unit, a Gaussian smoothing kernel, and the output of U-Net, respectively. Note that while DIP is a method for image restoration using neural networks, U-Net refers to a specific type of neural network architecture that is often utilized to implement the DIP method. The algorithm runs with F q (z) / X out by num_iter times of execution at first for updating U-Net and once with F q (z) / X high for predicting high-resolution images. ''F'' denotes a conceptual function that links the CNN architectures (b) to the output (f). The matching parts in the top and bottom images are shown in the same color. The detailed information can be found in the STAR Methods section. (B) The comparison of conventional spatial feature plots and high-resolution images made by SuperST from a publicly available breast cancer dataset. The pixels in high-resolution images darker than 95 percentiles of each image are not shown.
Figure 2. Comparison between conventional visualization, SuperST, XFuse, TESLA, and iStar with the IF image (A) The acquisition of Visium ST library for Mouse 4T1 with a recommended tissue preparation protocol, followed by the primary and the fluorescent secondary antibody treatment. H&E, ST, and FL refer to the H&E-stained tissue, the tissue used for acquiring the Visium ST library, and the tissue utilized for fluorescence imaging, respectively. (B) Each Pearson correlation coefficient between IF and each super-resolution algorithm (SuperST, XFuse, TESLA, and iStar) output was represented for Mouse Brain (Rbfox3), Mouse 4T1 (Pecam1), Human Ovarian (PTPRC), and Human Ductal (CD3G). Conv, indicating the conventional visualization of ST data (negative control), was also compared with IF. Super-resolution algorithms outperformed the conventional visualization in most cases. Also, the correlation coefficient was found to be the highest in SuperST compared with the other algorithms.
Generation of super-resolution images from barcode-based spatial transcriptomics by deep image prior

December 2024

·

11 Reads

Cell Reports Methods

Spatially resolved transcriptomics (ST) has revolutionized the field of biology by providing a powerful tool for analyzing gene expression in situ. However, current ST methods, particularly barcode-based methods, have limitations in reconstructing high-resolution images from barcodes sparsely distributed in slides. Here, we present SuperST, an algorithm that enables the reconstruction of dense matrices (higher-resolution and non-zero-inflated matrices) from low-resolution ST libraries. SuperST is based on deep image prior, which reconstructs spatial gene expression patterns as image matrices. Compared with previous methods, SuperST generated output images that more closely resembled immunofluorescence images for given gene expression maps. Furthermore, we demonstrated how one can combine images created by SuperST with computer vision algorithms. In this context, we proposed a method for extracting features from the images, which can aid in spatial clustering of genes. By providing a dense matrix for each gene in situ, SuperST can successfully address the resolution and zero-inflation issue.


Overview of IAMSAM interface panels. a The main visualization panel displays the H&E slides of the ST data, along with the corresponding segmentation masks. These masks highlight different ROIs within the tissue image, allowing users to visually explore and select specific ROIs. After pressing “Run ST analysis,” the downstream analysis panel presents the results of downstream analysis, including (b) DEG analysis, c enrichment analysis, and d cell type proportion
Analysis of cancer heterogeneity in human breast cancer using IAMSAM. a H&E-stained image of the human breast cancer block A Sect. 1.1 dataset, showing the selected ROIs. b Close-up image of ROI 1, highlighting distinct morphological features. c Close-up image of ROI 2, highlighting distinct morphological features. d IAMSAM analysis showing the identified ROIs based on distinct morphological features. e Box plot showing the top 10 high fold change DEGs in ROI 1 compared to ROI 2. f Bar plot of the top enriched GO terms (adjusted p-value < 0.05) in ROI 1. g Box plot showing the top 10 high fold change DEGs in ROI 2 compared to ROI 1. h Bar plot of the top enriched GO terms (adjusted p-value < 0.05) in ROI 2. i Cell type proportion analysis showing the distribution of cell types within ROI 1 and ROI 2
Workflow of IAMSAM. This figure provides an overview of the workflow of IAMSAM. The gene expression of ST data is preprocessed through spot filtering, gene filtering, and normalization step. The H&E image of the ST data is segmented using the SAM in two different modes: everything-mode and prompt-mode. The selected ROIs are then subjected to downstream analysis, which includes DEG identification, enrichment analysis, and cell type proportion analysis
Main characteristics of IAMSAM. This figure introduces the two main modes of operation in IAMSAM: everything-mode and prompt-mode. a In the everything-mode, IAMSAM generates segmentation masks for the entire tissue images. The mask confidence threshold directly affects the segmentation result, where a higher threshold leads to more precise segmentation but fewer selected masks. b In the prompt-mode, users can provide prompts to the SAM model by drawing rectangle boxes on the visualization panel using the drawing tool provided by Plotly. When users input three rectangle boxes as drawn, IAMSAM returns the corresponding ROIs. c By combining the zoom-in interface with the prompt-mode, IAMSAM allows for the detailed examination of microscopic histology features, enhancing analysis capabilities. d IAMSAM can also process data from platforms like Xenium, following appropriate preprocessing steps. e IAMSAM is applicable to various imaging modalities, including fluorescence imaging, thereby expanding its utility in different experimental settings
Comparative performance analysis of traditional method and IAMSAM method. a Traditional method involves manual drawing of ROIs in Loupe Browser, exporting barcode data, and performing downstream bioinformatic analysis using R or Python. This process is manual, time-consuming, and involves multiple steps and tools. b IAMSAM method utilizes a preprocessing script to create an AnnData file, followed by automated ROI identification and downstream analysis within the IAMSAM framework. This method leverages morphological features, is streamlined and automated, reducing manual effort and increasing reproducibility
IAMSAM: image-based analysis of molecular signatures using the Segment Anything Model

November 2024

·

17 Reads

·

1 Citation

Genome Biology

Spatial transcriptomics is a cutting-edge technique that combines gene expression with spatial information, allowing researchers to study molecular patterns within tissue architecture. Here, we present IAMSAM, a user-friendly web-based tool for analyzing spatial transcriptomics data focusing on morphological features. IAMSAM accurately segments tissue images using the Segment Anything Model, allowing for the semi-automatic selection of regions of interest based on morphological signatures. Furthermore, IAMSAM provides downstream analysis, such as identifying differentially expressed genes, enrichment analysis, and cell type prediction within the selected regions. With its simple interface, IAMSAM empowers researchers to explore and interpret heterogeneous tissues in a streamlined manner.


Spatially Resolved Whole-Transcriptomic and Proteomic Profiling of Lung Cancer and Its Immune Microenvironment According to PD-L1 Expression

September 2024

·

43 Reads

·

1 Citation

Cancer Immunology Research

The expression of PD-L1 on tumor cells (TCs) is used as an immunotherapy biomarker in lung cancer, but heterogeneous intratumoral expression is often observed. Using a Digital Spatial Profiling, we performed proteomic and whole-transcriptomic analyses of TCs and immune cells (ICs) in spatially matched areas based on tumor PD-L1 expression and the status of the immune microenvironment. Our findings were validated using immunohistochemistry, The Cancer Genome Atlas, and immunotherapy cohorts. ICs in areas with high PD-L1 expression on TCs showed more features indicative of immunosuppression and exhaustion than ICs in areas with low PD-L1 expression on TCs. TCs highly expressing PD-L1 within immune-inflamed (IF) areas show up-regulation of pro-inflammatory processes, whereas TCs highly expressing PD-L1 within immune-deficient (ID) areas show up-regulation of various metabolic processes. Using differentially expressed genes of TCs between the IF and ID areas, we identified a novel prognostic gene signature for lung cancer. In addition, a high ratio of CD8+ cells to M2 macrophages was found to predict favorable outcomes in patients with PD-L1-expressing lung cancer after immune checkpoint inhibitor therapy. This study demonstrates that TCs and ICs have distinct spatial features within the tumor microenvironment that are related to tumor PD-L1 expression and IC infiltration.


Figure 1 Clinical antitumor efficacy of paclitaxel (PTX) correlates with toll-like receptor 4 (TLR4) signaling and crosspresentation in tumor-associated macrophages (TAMs) with high TLR4 expression in triple-negative breast cancer (TNBC). (A) Spearman correlation of TLR4 expression in cell clusters in the publicly available spatial transcriptomic dataset (Zenodo.4739749). (B) Intercellular heterogeneity of TLR4 expression in TNBC patients was quantified by single-cell RNA sequencing. The uniform manifold approximation and projection for dimension reduction (UMAP) plot of immune cells in TME is distinguished into four clusters (upper left) and TLR4 expression of the cells (upper right). The clusters are as follows: T cell, myeloid cell, B cell, and innate lymphoid cell. The UMAP plot of myeloid cells is distinguished into seven clusters (lower left) and TLR4 expression of the cells (lower right). Macrophage, monocyte, conventional dendritic cell 1(cDC1), conventional dendritic cell 2 (cDC2), myeloid DC (mDC), plasmacytoid DCs (pDC), and mast cell. (C) TLR4 expression in myeloid cell populations from immune cells in tumors of TNBC patients (GSE169246). Samples include baseline (specimen collected before treatment of PTX) and post-treatment of PTX or its combination with the anti-PD-L1 atezolizumab. (D) Intercellular heterogeneity of TLR4 expression in TME of TNBC syngeneic mouse models (EO771, n=4; 4T1, n=6) and syngeneic mouse model (MMTV-PyMT, n=4) were analyzed with flow cytometry. Each column displays group means with individual data points and error bars with SEM. Statistical significance was determined using one-way analysis of variance (ANOVA) followed by Tukey's multiple comparison test. P values indicate significant differences (**p<0.01; ***p<0.001; only statistically significant comparisons shown). (E) Gene set enrichment analysis (GSEA) on TAM in tumors of TNBC patients (GSE169246) who received PTX treatment (post-treatment specimens), including both responder and non-responders. (F) Upregulated signaling pathways from baseline sample data from responders who received PTX+anti-PD-L1 therapy compared with non-responders (pretreatment specimens). The statistical analysis of transcriptome data is detailed in the Methods section. on July 16, 2024 by guest. Protected by copyright. http://jitc.bmj.com/
Figure 4 Paclitaxel (PTX) amplifies the antitumor effects of PD-1 blockade in triple-negative breast cancer (TNBC) through toll-like receptor 4 (TLR4)-dependent mechanisms. (A) EO771 tumor-bearing mice were intraperitoneally injected with clodronate liposome (CLO) for TAM depletion. The control liposome (Con) was injected for comparison. For CD8+ or CD4+ T cell depletion, an anti-mouse CD8 or CD4 antibody was injected as the described schedule. (B) The tumor growth suppression of PTX was abrogated when tumor-associated macrophage (TAM) was depleted at EO771 tumor-bearing mice (n=6) (left). CD8+ T cell depletion abolished PTX-induced antitumor efficacy, which was retained on CD4+ T cell depletion (n=6 or 8) (right). (C-D) Multiplex immunohistochemistry (IHC) staining of TME from the acquired tumor tissue. (C) Representative multiplex IHC images of TME from control and PTX-treated mice. (D) PD-L1 expression profile of TME on PTX treatment was quantified through Inform & R, assessed by the cell count per area (mm 2 ) of PD-L1+ myeloid cells in TME (left) (n=3) and the mean fluorescence intensity (MFI) of PD-L1 from total myeloid cells (right) (n=3). (E) The PTX and αPD-1 treatment combination was analyzed on EO771 tumor-bearing mice. PTX was injected accordingly, and αPD-1 was intraperitoneally injected on day 2, day 4, day 6, and day D 8. (F) The tumor volume of EO771 tumor-bearing mice on PTX and αPD-1 treatment was measured every other day from the day of initial injection on TLR4 WT (n=6) (left) and TLR4 KO (n=5) (right) mice. (G) The survival rate of EO771 tumor-bearing mice on PTX and αPD-1 treatment was monitored for 16 days (n=10). Representative images of multiplex IHC. Each data point indicates means with error bars for SEM (B, F) or each column displays group means with individual data points and error bars with SEM (D). Statistical significance was determined using Student's unpaired two-tailed t-test (D), evaluated with tumor volumes at day 10 postinitial injection (B, F), or determined using Mantel-Cox test. P values indicate significant differences (*p<0.05; **p<0.01; ***p<0.001). (H) Schematic illustration of immunotherapeutic effect by enhancing antigen crosspresentation of TAM with PTX treatment combined with αPD-1. The schematic was created with BioRender.com. on July 16, 2024 by guest. Protected by copyright.
Novel insights into paclitaxel’s role on tumor-associated macrophages in enhancing PD-1 blockade in breast cancer treatment

July 2024

·

71 Reads

·

4 Citations

Background Triple-negative breast cancer (TNBC) poses unique challenges due to its complex nature and the need for more effective treatments. Recent studies showed encouraging outcomes from combining paclitaxel (PTX) with programmed cell death protein-1 (PD-1) blockade in treating TNBC, although the exact mechanisms behind the improved results are unclear. Methods We employed an integrated approach, analyzing spatial transcriptomics and single-cell RNA sequencing data from TNBC patients to understand why the combination of PTX and PD-1 blockade showed better response in TNBC patients. We focused on toll-like receptor 4 (TLR4), a receptor of PTX, and its role in modulating the cross-presentation signaling pathways in tumor-associated macrophages (TAMs) within the tumor microenvironment. Leveraging insights obtained from patient-derived data, we conducted in vitro experiments using immunosuppressive bone marrow-derived macrophages (iBMDMs) to validate if PTX could augment the cross-presentation and phagocytosis activities. Subsequently, we extended our study to an in vivo murine model of TNBC to ascertain the effects of PTX on the cross-presentation capabilities of TAMs and its downstream impact on CD8+ T cell-mediated immune responses. Results Data analysis from TNBC patients revealed that the activation of TLR4 and cross-presentation signaling pathways are crucial for the antitumor efficacy of PTX. In vitro studies showed that PTX treatment enhances the cross-presentation ability of iBMDMs. In vivo experiments demonstrated that PTX activates TLR4-dependent cross-presentation in TAMs, improving CD8+ T cell-mediated antitumor responses. The efficacy of PTX in promoting antitumor immunity was elicited when combined with PD-1 blockade, suggesting a complementary interaction. Conclusions This study reveals how PTX boosts the effectiveness of PD-1 inhibitors in treating TNBC. We found that PTX activates TLR4 signaling in TAMs. This activation enhances their ability to present antigens, thereby boosting CD8+ T cell antitumor responses. These findings not only shed light on PTX’s immunomodulatory role in TNBC but also underscore the potential of targeting TAMs’ antigen presentation capabilities in immunotherapy approaches.


Spatial Transcriptomics Reveals Spatially Diverse Cancer-Associated Fibroblast in Lung Squamous Cell Carcinoma Linked to Tumor Progression

May 2024

·

25 Reads

While cancer-associated fibroblasts (CAFs) are crucial in influencing tumor growth and immune responses in lung cancer, we still lack a comprehensive understanding of their spatial organization associated with tumor progression and clinical outcomes. This gap highlights the need to elucidate how the intricate spatial arrangement of CAFs affects their interactions within the tumor microenvironment, ultimately shaping cancer progression and patient prognosis. Here, we unveil the spatial diversity of CAFs in lung squamous cell carcinoma (LUSC), a prevalent and aggressive lung cancer type, elucidating their impact on tumor progression and patient outcomes using spatial transcriptomics (ST). Image-based ST data from 33 LUSC patients demonstrated a significant association of spatial interactions of tumor epithelium and CAFs with tumor size and metabolic activity measured by [18F]fluorodeoxyglucose PET. Furthermore, the proximity of fibroblasts to tumor epithelial cells was linked to recurrence-free survival in LUSC patients. By characterizing CAFs based on their spatial relationship, we identified distinct molecular signatures related to spatially distinct fibroblast subpopulations. In addition, barcode-based ST data from 8 LUSC patients revealed spatially overlapping fibroblast regions characterized by upregulated glycolysis pathways. Our study underscores the importance of the complex spatial dynamics of the tumor microenvironment revealed by ST and its implications for patient outcomes in LUSC.


Empowering PET Imaging Reporting with Retrieval-Augmented Large Language Models and Reading Reports Database: A Pilot Single Center Study

May 2024

·

9 Reads

·

3 Citations

Introduction: The potential of Large Language Models (LLMs) in enhancing a variety of natural language tasks in clinical fields includes medical imaging reporting. This pilot study examines the efficacy of a retrieval-augmented LLM system considering zero-shot learning capability of LLMs, integrated with a comprehensive PET reading reports database, in improving referring previous reports and decision-making. Methods: We developed a custom LLM framework enhanced with retrieval capabilities, leveraging a database encompassing nine years of PET imaging reports from a single center. The system employs vector space embedding of the reports database to facilitate retrieval based on similarity metrics. Queries prompt the system to retrieve embedded vectors, generating context-based answers and identifying similar cases or differential diagnoses from the historical reports database. Results: The system efficiently organized embedded vectors from PET reading reports, showing that imaging reports were accurately clustered within the embedded vector space according to the diagnosis or PET study type. Based on this system, a proof-of-concept chatbot was developed and showed the framework's potential in referencing reports of previous similar cases and identifying exemplary cases for various purposes. Additionally, it demonstrated the capability to offer differential diagnoses, leveraging the vast database to enhance the completeness and precision of generated reports. Conclusions: The integration of a retrieval-augmented LLM with a large database of PET imaging reports represents an advancement in medical reporting within nuclear medicine. By providing tailored, data-driven insights, the system not only improves the relevance of PET report generation but also supports enhanced decision-making and educational opportunities. This study underscores the potential of advanced AI tools in transforming medical imaging reporting practices.


CELLama: Foundation Model for Single Cell and Spatial Transcriptomics by Cell Embedding Leveraging Language Model Abilities

May 2024

·

10 Reads

·

4 Citations

Large-scale single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) have transformed biomedical research into a data-driven field, enabling the creation of comprehensive data atlases. These methodologies facilitate detailed understanding of biology and pathophysiology, aiding in the discovery of new therapeutic targets. However, the complexity and sheer volume of data from these technologies present analytical challenges, particularly in robust cell typing, integration and understanding complex spatial relationships of cells. To address these challenges, we developed CELLama (Cell Embedding Leverage Language Model Abilities), a framework that leverage language model to transform cell data into ’sentences’ that encapsulate gene expressions and metadata, enabling universal cellular data embedding for various analysis. CELLama, serving as a foundation model, supports flexible applications ranging from cell typing to the analysis of spatial contexts, independently of manual reference data selection or intricate dataset-specific analytical workflows. Our results demonstrate that CELLama has significant potential to transform cellular analysis in various contexts, from determining cell types across multi-tissue atlases and their interactions to unraveling intricate tissue dynamics.


Abstract 899: Development of a deep learning model for cell type mapping in colorectal cancer using H&E images leveraging image-based spatial transcriptomics data

March 2024

·

13 Reads

Cancer Research

Purpose The tumor microenvironment (TME) is crucial in colorectal cancer as it influences disease progression, treatment response, and patient outcomes, providing valuable insights for personalized therapies and prognostic assessments. Here, we have developed a deep learning model by integrating hematoxylin and eosin (H&E) stained images of colorectal cancer and image-based spatial transcriptomics (Xenium) to infer spatial mapping of cell types in TME only using H&E images. Methods A total of 30 H&E images of colorectal cancer obtained by tissue microarray were registered with image-based spatial transcriptomics data (Xenium). Utilizing a Variational Autoencoder (VAE) based model and leveraging reference single-cell data enables the acquisition of cell types for individual cells in image-based spatial transcriptomics. A convolutional neural network (CNN) model was developed using H&E image as inputs to predict cell types in H&E-stained tissue image patches of colorectal cancer collected from various patients. The model also estimated the cell types from H&E-stained whole slide tissue image of colorectal cancer of The Cancer Genome Atlas (TCGA-COAD). Results The accuracy of the model's predictions for cell types using H&E image patches was notably high and exhibited a significant concordance with the results obtained through the validation. The Intersection over Union (IoU) metric for image segmentation indicated a value of 0.66 for epithelial cells and 0.44 for TNK cells. The output of deep learning model for epithelial cells and T/NK cells from TCGA-COAD tissue images showed a correlation with human-labeled regions of cancer epithelium and immune cells. Conclusions Leveraging image-based spatial transcriptomics, we developed a deep learning model capable of discerning various cell types within the tumor microenvironment solely from H&E images. This clinically translatable approach is valuable for investigating tumor microenvironment to develop biomarkers associated with various cancer therapeutics particularly immuno-oncology drugs. This approach can yield objective deep learning-based models without human labels for characterizing the tumor microenvironment in single-cell resolution, particularly regarding spatial immune distribution. Citation Format: Seungho Cook, Dongjoo Lee, Myunghyun Lim, Jae Eun Lee, Daeseung Lee, Hyung-Jun Im, Jung-Soo Pyo, Kwon Joong Na, Hongyoon Choi. Development of a deep learning model for cell type mapping in colorectal cancer using H&E images leveraging image-based spatial transcriptomics data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 899.


Citations (5)


... The copyright holder for this preprint this version posted April 1, 2025. ; environments (AlGhadban et al., 2023;Alonso et al., 2024;Choi et al., 2024;. Specific applications were identified in ophthalmology diagnostics and the analysis of multimodal patient data, suggesting broader opportunities for integrating RAG AI into medical training and diagnostic processes (Upadhyaya et al., 2024). ...

Reference:

Bridging AI and Healthcare: A Scoping Review of Retrieval-Augmented Generation - Ethics, Bias, Transparency, Improvements, and Applications
Empowering PET imaging reporting with retrieval-augmented large language models and reading reports database: a pilot single center study

European Journal of Nuclear Medicine and Molecular Imaging

... NSUN2 also had been verified as a glucose sensor, and its activation by glucose promotes tumorigenesis and resistance to immunotherapy by sustaining TREX2 expression, thereby inhibiting the cGAS/STING pathway [34]. Numerous studies have demonstrated that patients exhibiting high PD-L1 expression on tumor cells tend to benefit from PD-1/ PD-L1 antibody therapy [23,38,39]. Therefore, identifying strategies to modulate PD-L1 expression is of paramount importance. ...

Spatially Resolved Whole-Transcriptomic and Proteomic Profiling of Lung Cancer and Its Immune Microenvironment According to PD-L1 Expression
  • Citing Article
  • September 2024

Cancer Immunology Research

... For instance, paclitaxel, a compound extracted from the bark of the Pacific yew tree (Taxus brevifolia), promotes microtubule polymerization, inhibits their depolymerization, and disrupts the mitotic process in cancer cells. This compound has demonstrated efficacy in treating various cancers, including ovarian, breast, and lung cancer, thereby playing a pivotal role in oncological treatments [172]. Camptothecin, an alkaloid extracted from the Chinese dove tree (Camptotheca acuminata), functions primarily to inhibit topoisomerase I, thereby disrupting normal DNA replication and transcription processes, ultimately leading to apoptosis in cancer cells. ...

Novel insights into paclitaxel’s role on tumor-associated macrophages in enhancing PD-1 blockade in breast cancer treatment

... Nuclear medicine can benefit from LLMs in many of the same ways as radiology, including applications in reporting, medical record navigation, and education. Currently, there are few LLM studies that focus specifically on nuclear medicine, but initial studies have found that LLMs perform well at classifying nuclear medicine reports (53), generating impressions from PET findings (42), and retrieving examinations (58). Furthermore, in the emerging era of theranostics, it is likely that LLMs will be useful in summarizing complex medical records and extracting structured data (e.g., patient outcomes), which can ultimately support research efforts in validating and optimizing approaches to radiopharmaceutical therapy. ...

Empowering PET Imaging Reporting with Retrieval-Augmented Large Language Models and Reading Reports Database: A Pilot Single Center Study
  • Citing Preprint
  • May 2024

... These regulatory mechanisms ensure the proper assembly and function of microtubules in response to cellular demands. Abnormalities in TUBB6 expression are associated with cancer progression, including gliomas [15][16][17]. Dysregulation of TUBB6 can disrupt microtubule dynamics and stability, potentially leading to uncontrolled cell division and tumorigenesis. Elevated or altered TUBB6 expression in tumors has been associated with increased tumor aggressiveness and poor prognosis. ...

Quantitative proteomics identifies TUBB6 as a biomarker of muscle‐invasion and poor prognosis in bladder cancer