Tahsin Kurc’s research while affiliated with Stony Brook University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (339)


Artificial intelligence and the interplay between cancer and immunity
  • Chapter

January 2025

·

6 Reads

·

Tahsin Kurc

·

Joel Haskin Saltz


Figure 1. t-SNE visualization [66] of image representation from different methods. (a) Natural-Image-SSL. (b) Histopath-Image-SSL. (c) Gene-Image CL (d) Gene-Image CL with ranking loss. (e) RankByGene. We first use K-means clustering to assign a class label to each spot based on gene expression values. Then, we apply t-SNE to visualize the image features corresponding to each spot and color them according to their assigned class labels. A higher v-score [57] indicates that the image representation more precisely captures the distribution of gene expression values. Adding ranking loss (d) enhances the separability between image features of different classes. Our method (e), with both ranking loss and distillation shows the highest separability.
Figure 2. Overview of our RankbyGene framework. The framework begins with WSI Tiling, where WSIs are divided into tiles, each paired with a gene profile. In the feature extraction part, weak and strong augmentations of the image tiles are processed through a teacher encoder and student encoder, while a gene encoder extracts features from the gene profile. The feature alignment stage ensures that weakly and strongly augmented image features are aligned through intra-modal distillation loss and the image and gene features are aligned using gene-image contrastive loss. Meanwhile, our proposed cross-modal ranking consistency loss maintains consistent similarity ranking orders across two modalities.
Figure 4. Visualization of PTGES3 gene expression predictions from different methods, with all values normalized to the range of 0 to 1. Compared to the baseline, our predictions show the closest match to the ground truth.
Figure 5. Rank accuracy for different methods during training.
Figure 6. Additional visualizations of gene expression predictions for different genes using various methods, with all values normalized to the range of 0 to 1. From the first row to the fourth row: TUBA1C, ESRP1, MAL2, and RAB2A. Compared to the baseline, our predictions show the closest alignment with the ground truth.

+2

RankByGene: Gene-Guided Histopathology Representation Learning Through Cross-Modal Ranking Consistency
  • Preprint
  • File available

November 2024

·

25 Reads

Spatial transcriptomics (ST) provides essential spatial context by mapping gene expression within tissue, enabling detailed study of cellular heterogeneity and tissue organization. However, aligning ST data with histology images poses challenges due to inherent spatial distortions and modality-specific variations. Existing methods largely rely on direct alignment, which often fails to capture complex cross-modal relationships. To address these limitations, we propose a novel framework that aligns gene and image features using a ranking-based alignment loss, preserving relative similarity across modalities and enabling robust multi-scale alignment. To further enhance the alignment's stability, we employ self-supervised knowledge distillation with a teacher-student network architecture, effectively mitigating disruptions from high dimensionality, sparsity, and noise in gene expression data. Extensive experiments on gene expression prediction and survival analysis demonstrate our framework's effectiveness, showing improved alignment and predictive performance over existing methods and establishing a robust tool for gene-guided image representation learning in digital pathology.

Download

TCIA Ingestion of imaging data from healthcare organizations (HCO) and sharing of files with the Linkage Honest Broker. An HCO Token Package of the site's clinical data is generated and checked‐in via SFTP to the Linkage Honest Broker (LHB). An HCO also generates a Token Package for images and submits to TCIA to update the Token Package and check‐in to LHB.
Step 1: A site generates its own viral sequencing data from collected samples and submits it to the NIH’ GenBank for validation and storage. Step 2: NIH’ GenBank validates and produces accession numbers that are sent back to the submitting site. Steps 3 and 4: The site submits one file to the N3C containing the following: Submitting site ID, patient's synthetic ID (PSEUDO_ID), sample collection date, and viral variant name; the other file is submitted to the Honest Broker (HB) and contains the following: The submitting site ID, accession numbers, and patient's synthetic ID (PSEUDO_ID).
Linking The Cancer Imaging Archive and GenBank to the National Clinical Cohort Collaborative

September 2024

·

23 Reads

Objective This project demonstrates the feasibility of connecting medical imaging data and features, SARS‐CoV‐2 genome variants, with clinical data in the National Clinical Cohort Collaborative (N3C) repository to accelerate integrative research on detection, diagnosis, and treatment of COVID‐19‐related morbidities. The N3C curated a rich collection of aggregated and de‐identified electronic health records (EHR) data of over 18 million patients, including 7.5 million COVID‐positive patients, seen at hospitals across the United States. Medical imaging data and variant samples are important data modalities used in the study of COVID‐19. Materials and Methods Imaging data and features are hosted on the Cancer Imaging Archive (TCIA), and sequenced variant samples are analyzed and stored at the NIH GenBank. The University of Arkansas for Medical Sciences (UAMS) published the first COVID‐19 data set of 105 patients on TCIA and 37 patients on GenBank. We developed a process to link imaging and genomic variants and N3C EHR data through Privacy Preserving Record Linkage (PPRL) using de‐identified cryptographic hashes to match records associated with the same individuals without using patient identifiers. Results The PPRL techniques were piloted using clinical and imaging data sets provided by UAMS. Developed software components and processes executed properly, and linked data were returned and processed for visualization. Conclusion Linking across clinical data sources at the patient level provides opportunities to gain insights from data that may not be known otherwise. The PPRL prototype and the pilot serve as a model to link disparate and diverse data repositories to enhance clinical research.


Fig. 2. Validation of stain vector euclidean distance as an alternative to published histogram wasserstein Distance analyses. The prior graphical method utilized histogram wasserstein distance for optimizing WSI selection as a normalization reference. Comparative analysis of both distances across WSI pairs demonstrates a strong Pearson correlation (r=0.99), thus endorsing the stain vector euclidean distance as a valid metric
Optimized Whole-Slide-Image H&E Stain Normalization: A Step Towards Big Data Integration in Digital Pathology

September 2024

·

52 Reads

IEEE Open Journal of Engineering in Medicine and Biology

In the medical diagnostics domain, pathology and histology are pivotal for the precise identification of diseases. Digital histopathology, enhanced by automation, facilitates the efficient analysis of massive amount of biopsy images produced on a daily basis, streamlining the evaluation process. This study focuses in Stain Color Normalization (SCN) within a Whole-Slide Image (WSI) cohort, aiming to reduce batch biases. Building on published graphical method, this research demonstrates a mathematical population or data-driven method that optimizes the dependency on the number of reference WSIs and corresponding aggregate sums, thereby increasing SCN process efficiency. This method expedites the analysis of color convergence 50- fold by using stain vector Euclidean distance analysis, slashing the requirement for reference WSIs by more than half. The approach is validated through a tripartite methodology: 1) Stain vector euclidean distances analysis, 2) Distance computation timing, and 3) Qualitative and quantitative assessments of SCN across cancer tumors regions of interest. The results validate the performance of data-driven SCN method, thus potential to enhance the precision and reliability of computational pathology analyses. This advancement is poised to enhance diagnostic processes, therapeutic strategies, and patient prognosis.



Analysis of Keratin 17 relative to the PDAC immune microenvironment. a Overall fraction of immune cell types averaged across all cases (n = 235). b Spearman correlation between manual and digital K17 scoring across entire tumor sections. c. Overall immune cells stacked bar plot including CD4+ T cells, CD8+ T cells, CD16+ macrophage, and CD163+ macrophage density (cells/mm²). The right Y-axis depicts the overall K17 score within each tumor. d Development of a digital scoring system focused on spatial relationships between peritumoral and intratumoral immune cells and K17. Intratumoral zones were defined as those that directly contacted a tumor cell while peritumoral zones included only immune cells within 25 μm of the closest tumor cell boundary
K17 impacts intratumoral and peritumoral T cells and macrophages. a–h T cell counts in peritumoral and intratumoral K17-positive and K17-negative regions. a Peritumoral CD8+ T cells. c Intratumoral CD8+ T cells. e Peritumoral CD4+ T cells. g Intratumoral CD4+ T cells. i–p Macrophage counts in peritumoral and intratumoral K17-negative regions relative to K17-positive regions. i Peritumoral CD16+ macrophages; k Intratumoral CD16+ macrophages; m peritumoral CD163+ macrophages; o intratumoral CD163+ macrophages. Representative mIHC images for each panel highlight intratumor and peritumor b, d CD8+ T cells (purple); f, h CD4+ T cells (red); j, l CD16+ macrophages (yellow) and; n, p CD163+ macrophages (green) relative to K17-positive tumor cells (brown) and K17-negative tumor cells (teal). Note that immune cell ratios are normalized to counts in K17-positive zones and relative height of the bars reflects the magnitude of differences between ratios in K17-negative versus K17-positive zones, not relative differences in overall immune cell counts
The impact of K17 on CD8+ T Cells is independent of neoadjuvant therapy. a, b Peritumoral and intratumoral CD8+ T cell density ratios in cases that did not receive neoadjuvant treatment and, c, d cases treated with neoadjuvant treatment
The impact of K17 on CD8+ T cells is independent of PDAC stage, grade, and lymph node status. Immune cell ratios in peritumoral and intratumoral K17-negative regions relative to K17-positive regions, ordered based on the density of immune cells in K17-positive zones. The inverse correlation between K17 expression and CD8+ peritumor and intratumoral T cells is independent of a–d stage, e–h tumor grade, and i–l Lymph node status
CD8+ T cells are increased in K17-negative regions, regardless of mutation status. Immune cell ratios in peritumoral and intratumoral K17-negative regions relative to K17-positive regions and mutational status of KRAS, p53, SMAD4, and CDKN2A. a OncoPrint [8, 12, 18] depicting the most frequently mutated genes in the KYT cohort. b–q Wild type versus mutant KRAS, p53, SMAD4, and CDKN2A
Keratin 17 modulates the immune topography of pancreatic cancer

May 2024

·

49 Reads

·

3 Citations

Journal of Translational Medicine

Background The immune microenvironment impacts tumor growth, invasion, metastasis, and patient survival and may provide opportunities for therapeutic intervention in pancreatic ductal adenocarcinoma (PDAC). Although never studied as a potential modulator of the immune response in most cancers, Keratin 17 (K17), a biomarker of the most aggressive (basal) molecular subtype of PDAC, is intimately involved in the histogenesis of the immune response in psoriasis, basal cell carcinoma, and cervical squamous cell carcinoma. Thus, we hypothesized that K17 expression could also impact the immune cell response in PDAC, and that uncovering this relationship could provide insight to guide the development of immunotherapeutic opportunities to extend patient survival. Methods Multiplex immunohistochemistry (mIHC) and automated image analysis based on novel computational imaging technology were used to decipher the abundance and spatial distribution of T cells, macrophages, and tumor cells, relative to K17 expression in 235 PDACs. Results K17 expression had profound effects on the exclusion of intratumoral CD8+ T cells and was also associated with decreased numbers of peritumoral CD8+ T cells, CD16+ macrophages, and CD163+ macrophages (p < 0.0001). The differences in the intratumor and peritumoral CD8+ T cell abundance were not impacted by neoadjuvant therapy, tumor stage, grade, lymph node status, histologic subtype, nor KRAS, p53, SMAD4, or CDKN2A mutations. Conclusions Thus, K17 expression correlates with major differences in the immune microenvironment that are independent of any tested clinicopathologic or tumor intrinsic variables, suggesting that targeting K17-mediated immune effects on the immune system could restore the innate immunologic response to PDAC and might provide novel opportunities to restore immunotherapeutic approaches for this most deadly form of cancer.


Metrics reloaded: recommendations for image analysis validation

February 2024

·

245 Reads

·

239 Citations

Nature Methods

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.


Warehouse workflow.
The flowchart of the designed 2 step CBR system.
An Intelligent Search & Retrieval System (IRIS) and Clinical and Research Repository for Decision Support Based on Machine Learning and Joint Kernel-based Supervised Hashing

February 2024

·

29 Reads

Cancer Informatics

Large-scale, multi-site collaboration is becoming indispensable for a wide range of research and clinical activities in oncology. To facilitate the next generation of advances in cancer biology, precision oncology and the population sciences it will be necessary to develop and implement data management and analytic tools that empower investigators to reliably and objectively detect, characterize and chronicle the phenotypic and genomic changes that occur during the transformation from the benign to cancerous state and throughout the course of disease progression. To facilitate these efforts it is incumbent upon the informatics community to establish the workflows and architectures that automate the aggregation and organization of a growing range and number of clinical data types and modalities ranging from new molecular and laboratory tests to sophisticated diagnostic imaging studies. In an attempt to meet those challenges, leading health care centers across the country are making steep investments to establish enterprise-wide, data warehouses. A significant limitation of many data warehouses, however, is that they are designed to support only alphanumeric information. In contrast to those traditional designs, the system that we have developed supports automated collection and mining of multimodal data including genomics, digital pathology and radiology images. In this paper, our team describes the design, development and implementation of a multi-modal, Clinical & Research Data Warehouse (CRDW) that is tightly integrated with a suite of computational and machine-learning tools to provide actionable insight into the underlying characteristics of the tumor environment that would not be revealed using standard methods and tools. The System features a flexible Extract, Transform and Load (ETL) interface that enables it to adapt to aggregate data originating from different clinical and research sources depending on the specific EHR and other data sources utilized at a given deployment site.


Keratin 17 modulates the immune topography of pancreatic cancer

January 2024

·

35 Reads

Background The immune microenvironment impacts tumor growth, invasion, metastasis, and patient survival and may provide opportunities for therapeutic intervention in pancreatic ductal adenocarcinoma (PDAC). Although never studied as a potential modulator of the immune response in most cancers, Keratin 17 (K17), a biomarker of the most aggressive (basal) molecular subtype of PDAC, is intimately involved in the histogenesis of the immune response in psoriasis, basal cell carcinoma, and cervical squamous cell carcinoma. Thus, we hypothesized that K17 expression could also impact the immune cell response in PDAC, and that uncovering this relationship could provide insight to guide the development of immunotherapeutic opportunities to extend patient survival. Methods Multiplex immunohistochemistry (mIHC) and automated image analysis based on novel computational imaging technology were used to decipher the abundance and spatial distribution of T cells, macrophages, and tumor cells, relative to K17 expression in 235 PDACs. Results K17 expression had profound effects on the exclusion of intratumoral CD8 + T cells and was also associated with decreased numbers of peritumoral CD8 + T cells, CD16 + macrophages, and CD163 + macrophages (p < 0.0001). The differences in the intratumor and peritumoral CD8 + T cell abundance were not impacted by neoadjuvant therapy, tumor stage, grade, lymph node status, histologic subtype, nor KRAS, p53, SMAD4, or CDKN2A mutations. Conclusions Thus, K17 expression correlates with major differences in the immune microenvironment that are independent of any tested clinicopathologic or tumor intrinsic variables, suggesting that targeting K17-mediated immune effects on the immune system could restore the innate immunologic response to PDAC and might provide novel opportunities to restore immunotherapeutic approaches for this most deadly form of cancer.


Citations (61)


... To further elucidate the molecular characteristics of High-Risk cells, we performed differential gene expression analysis between High-Risk and Background subpopulations, identifying 323 upregulated genes (Fig. 2e). Among the upregulated genes, several were strongly associated with aggressive tumor phenotypes [44][45][46][47][48], displaying significantly higher expression in High-Risk cells while being markedly downregulated in Background cells (Fig. 2f). For example, KRT17, which is overexpressed in PDAC [48], and C15orf48 have been linked to poor clinical outcomes [44,45,47], while CEACAM6 has been implicated in promoting cancer cell invasion and metastasis in PDAC [46]. ...

Reference:

SIDISH Identifies High-Risk Disease-Associated Cells and Biomarkers by Integrating Single-Cell Depth and Bulk Breadth
Keratin 17 modulates the immune topography of pancreatic cancer

Journal of Translational Medicine

... Finally, with the success of latent diffusion models (LDMs) (Rombach et al., 2022), the usage of foundation models should be investigated. As LDMs have also been developed in the medical domain (Yellapragada et al., 2024), it is of high interest to study the extent to which they can capture semantic concepts and potentially improve upon existing methods. With the outlook of a recent, powerful foundational model that is based on flow matching (Esser et al., 2024), their usage in a conditional setting for image segmentation can be explored. ...

PathLDM: Text conditioned Latent Diffusion Model for Histopathology
  • Citing Conference Paper
  • January 2024

... The Jaccard index is calculated as JAC = T P T P +F P +F N . In the nuclei segmentation literature, the Jaccard index has sometimes been mistakenly referred to as average precision (AP), which can cause confusion with the area under the precision recall curve [13]. To ensure clarity, we consistently refer to this metric as the Jaccard index. ...

Metrics reloaded: recommendations for image analysis validation
  • Citing Article
  • February 2024

Nature Methods

... However, the advent of digital image analysis (DIA) and machine learning (ML) technologies has broadened the scope of artificial intelligence (AI) in this field. Over the past few years, a slew of deep learning (DL) based whole slide image (WSI) analysis tools such as QuPath 1 , TIA Toolbox 2 , MONAI 3 , SlideFlow 4 , PHARAOH 5 , WSInfer 6 have been introduced. ...

Open and reusable deep learning for pathology with WSInfer and QuPath

npj Precision Oncology

... To alleviate the burden of manual annotation and enhance the efficiency of analysis, there has been growing interest in utilizing generative models. Early works use Generative Adversarial Networks (GANs) [20] for automatic generation of pathology images [2,10,34]. In recent years, diffusion models [11,31,45,50,61,73,74] have emerged as much more reliable alternatives, generating accurate, high-resolution histopathology images [4,23,48,51,71]. ...

Topology-Guided Multi-Class Cell Context Generation for Digital Pathology
  • Citing Conference Paper
  • June 2023

Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

... By aggregating the diagnostic information from each patch, patch-based methods can effectively reduce the impact of artifacts or blurred areas within low-quality slides [53]. Combined with hybrid models that integrate classical image processing techniques and CNNs, patch-based approaches have shown promise for achieving high diagnostic accuracy in low-quality imaging environments [54]. Collectively, these methodologies offer a comprehensive toolkit for improving AI-driven diagnostics in resource-constrained settings [38,39]. ...

ChampKit: A framework for rapid evaluation of deep neural networks for patch-based histopathology classification
  • Citing Article
  • May 2023

Computer Methods and Programs in Biomedicine

... To provide a more comprehensive evaluation of the segmentation models, additional performance metrics beyond Mask IoU were included. While Mask IoU is commonly used to assess the overall overlap between predicted segmentation masks and the ground truth, it does not fully capture boundary accuracy, which is critical in medical imaging applications such as liver segmentation 47,48 . Trimap IoU was introduced to specifically evaluate segmentation performance near object boundaries. ...

Understanding metric-related pitfalls in image analysis validation

... Digital pathology has transformed cancer diagnosis and prognosis prediction through computational methods that can extract complex patterns directly from image data, enabling sophisticated tasks such as survival prediction, prognosis assessment, and treatment response estimation [5]. Nevertheless, the development and deployment of these computational models face significant challenges due to the labor-intensive nature of data collection and annotation [6], particularly when developing specialized models for the vast array of diagnostic categories and rare diseases encountered in clinical practice. ...

Effective and Efficient Active Learning for Deep Learning Based Tissue Image Analysis

Bioinformatics

... Integrating artificial intelligence (AI) into medical diagnostics has considerably enhanced diagnostic accuracy and consistency. This integration is particularly evident in the realm of chest X-ray interpretation, where AI technologies, notably Convolutional Neural Networks (CNNs) such as DenseNet121, have shown considerable capability in providing detailed and consistent interpretations [6][7][8][9][10][11]. The effectiveness of these AI models is primarily attributed to the extensive public databases available, including Chexpert, NIH, Padchest, and MIMIC [12][13][14]. ...

Author Correction: Federated learning enables big data for rare cancer boundary detection

... Different stain normalization techniques like Macenko and Vahadane methods were proposed to enhance model performance in classifying metastatic tissue slides. Additionally, it was confirmed in (37) that utilizing multiple slides to construct a representative reference for color normalization has shown promising results in improving computational pathology robustness and integrity. Color stain normalization plays a crucial role in tasks like image retrieval, where differences in colorization can impact the accuracy of analysis. ...

Robust Image Population based Stain Color Normalization: How Many Normalization Reference Slides Are Enough?

IEEE Open Journal of Engineering in Medicine and Biology